Version: Next

TAD Parameter

Updated 2024.07.06

experimental_plan.yaml Explanation

To apply AI Contents to your data, you need to enter information about the data and the Contents features you want to use in the experimental_plan.yaml file.
When you install AI Contents in the solution folder, you can find the basic experimental_plan.yaml file written for each content under the solution folder.
By entering data information in this YAML file and modifying/adding user arguments provided by each asset, you can create a data analysis model with the desired settings when running ALO.

1. File Overview

experimental_plan.yaml is a configuration file that defines the experiment plan for TAD, including the data path and parameters to be used in various pipeline stages. This file allows you to automate the data preprocessing, model training, and deployment process.

2. Structure Explanation

experimental_plan.yaml consists of the following main sections:

External data path settings (external_path)
User parameter settings (user_parameters)

3. External Data Path Settings (external_path)

Specifies the paths for loading data or saving results.

load_train_data_path: Specifies the path to load training data.
load_inference_data_path: Specifies the path to load inference data.
save_train_artifacts_path: Specifies the path to save training results.
save_inference_artifacts_path: Specifies the path to save inference results.
load_model_path: Specifies the path to load an existing model.

external_path:
  - load_train_data_path: ./solution/sample_data/train/
  - load_inference_data_path: ./solution/sample_data/test/
  - save_train_artifacts_path:
  - save_inference_artifacts_path:
  - load_model_path:

Parameter Name	DEFAULT	Description and Options
load_train_data_path	./sample_data/train/	Specifies the path to load training data. (Do not enter csv file name) All csv files under the entered path will be concatenated.
load_inference_data_path	./sample_data/test/	Specifies the path to load inference data. (Do not enter csv file name) All csv files under the entered path will be concatenated.
save_train_artifacts_path	-	Specifies the path to save training results.
save_inference_artifacts_path	-	Specifies the path to save inference results.
load_model_path	-	Specifies the path to load an existing model.

All files in subfolders under the entered path will also be combined.

All column names of the files to be combined must be the same.

4. User Parameter Settings (user_parameters)

Pipeline & Asset

user_parameters defines the configuration parameters to be used in each pipeline stage. Each pipeline is divided into train_pipeline and inference_pipeline, and each pipeline consists of several stages (Assets). Each Asset performs a specific data processing task and has various parameters that control that task.

Pipeline: The higher concept of data processing flow, consisting of several stages (Assets).
Asset: A unit that performs individual tasks within the pipeline. For example, data preprocessing, model training, etc.
args: Parameters that configure the operation of each Asset.
ui_args: Defines parameters that users can change in the AI Conductor UI.

User Arguments Explanation

What are User Arguments?

User arguments are parameters for setting the operation of each asset, used by entering them under args of each asset step in experimental_plan.yaml. AI Contents provides user arguments for each asset that makes up the pipeline so that users can apply various functions to their data. Users can refer to the guide below to change and add user arguments to model their data appropriately. User arguments are divided into "Required arguments" that are pre-written in experimental_plan.yaml and "Custom arguments" that users add by referring to the guide.

Required Arguments

Required arguments are the basic arguments that are immediately visible in experimental_plan.yaml. Most required arguments have built-in default values. For arguments with default values, they will operate with the default value even if the user does not set a value separately.
Among the required arguments in experimental_plan.yaml, users must set values for data-related arguments. (ex. x_columns, y_column)

Custom Arguments

Custom arguments are not written in experimental_plan.yaml but are functions provided by the asset that users can add to experimental_plan.yaml and use. They are used by adding them to 'args' for each asset.

TAD's pipeline is composed of Input - Readiness - Preprocess - Modeling(train/inference) assets in order, and user arguments are configured differently according to the function of each asset. First, try using the required user arguments written in experimental_plan.yaml, and then add user arguments to create a TAD model that perfectly fits your data!

4.1. Train Pipeline

Defines the settings needed for the training pipeline.

4.1.1. Input Asset

Defines settings related to the input path of training data.

- step: input
  args:
    - file_type: csv
      encoding: utf-8
  ui_args: 

4.1.2. Readiness Asset

Defines the columns of training data.

- step: readiness
  args:
    - x_columns: [factor0, factor1, factor2, ..]
      y_column: ''
      groupkey_columns: ''
  ui_args: 
    - x_columns:
    - y_column:

4.1.3. Preprocess Asset

Defines data preprocessing settings.

- step: preprocess
  args:
    - handling_missing: fill_0
      handling_scaling_x: standard
      drop_duplicate_time: False
      handling_downsampling_interval: 0
      downsampling_method: median
      difference_interval: 0

  ui_args:
    - handling_missing
    - handling_scaling_x

4.1.3. Train Asset

Defines settings related to model training.

- step: train
  args:
    - hpo_param: False
      contamination: ''
      models: 
        - knn
        - dbscan
        - ocsvm
        - lof
        - isf
      visualization: False
      
  ui_args:
    - hpo_param
    - contamination
    - models

4.2. Inference Pipeline

Defines the settings needed for the inference pipeline.

4.2.1. Input Asset

Defines settings related to the input path of inference data.

- step: input
  args:
    - none:

4.2.2. Readiness Asset

Defines settings related to the input path of inference data.

- step: readiness
  args:
    - none:

4.2.3. Preprocess Asset

Defines preprocessing settings for inference data.

- step: preprocess
  args:
    - none:

4.2.4. Inference Asset

Defines settings for performing inference using the model.

- step: inference
  args:
    - none:

5. Detailed Explanation of User Arguments

Input Asset

file_type

Enter the file extension of the Input data. Currently, AI Solution development is only possible with csv files.

Argument type: Required
Input type
- string
Possible values
- csv (default)
Usage
- file_type: csv
ui_args: X

encoding

Enter the encoding type of the Input data. Currently, AI Solution development is only possible with utf-8 encoding.

Argument type: Required
Input type
- string
Possible values
- utf-8 (default)
Usage
- encoding: utf-8
ui_args: X

Readiness Asset

x_columns

Enter the columns containing the data you want to use for anomaly detection. Multiple columns are supported.

Argument type: Required
Input type
- list
Possible values
- Column names
Usage
- x_columns : [ x_col1, x_col2 ]
ui_args: O

y_column

Enter the column containing information about which label each data point belongs to for anomaly detection. Since TAD basically does not require labels, enter this only if you want to get results using labels. The number of unique values should be less than 3.

Argument type: Custom
Input type
- string
Possible values
- Column name
Usage
- y_column : y_col
ui_args: X

groupkey_columns

Enter the column containing information about which group each data point belongs to if you want to perform anomaly detection by group. If you don't want to proceed by group, leave it blank. Currently supports one group key column.

Argument type: Required
Input type
- list
Possible values
- Column name
Usage
- groupkey_columns : [ groupkey_col_example ]
ui_args: X

Preprocess asset

handling_missing

Determines how to handle missing values in the data you want to perform anomaly detection on. If 'drop', it removes the corresponding row. 'most_frequent' fills with the mode, 'mean' with the average, 'median' with the median, and 'interpolation' with the interpolation value of the previous and next values.

Argument type: Custom
Input type
- string
Possible values
- drop (default)
- drop
- most_frequent
- mean
- median
- interpolation
Usage
- handling_missing : drop
ui_args: X

handling_scaling

Determines how to scale the data you want to perform anomaly detection on. If 'standard', it scales using the mean and std of the train data to have mean 0 and variance 1. If 'minmax', it adjusts the values to be between 0 and 1 using the min and max values of the train data. If 'maxabs', it adjusts the values to be between 0 and 1 using the maximum absolute value of the train data. If 'robust', it scales using the median and quartile values of the train data. If 'normalizer', it scales so that the length of the feature vector of the data becomes 1. If nothing is entered, no separate scaling is performed.

Argument type: Custom
Input type
- string
Possible values
- none (default)
- standard
- minmax
- maxabs
- robust
- normalizer
Usage
- handling_scaling : minmax
ui_args: X

drop_duplicate_time

Determines how to handle duplicate rows in the time column of the data you want to perform anomaly detection on. If True, it removes all but one of the rows with duplicate time columns.

Argument type: Custom
Input type
- string
Possible values
- True (default)
- True
- False
Usage
- drop_duplicate_time : True
ui_args: X

difference_interval

This item determines whether to perform differencing on the data you want to perform anomaly detection on. It is used when you want to perform anomaly detection based on how much the difference between previous values and the current point has changed. When used, the value should be entered as a positive integer greater than 0.

Argument type: Custom
Input type
- int
Possible values
- 0 (default)
- Positive integer 0 or greater
Usage
- difference_interval : 1
ui_args: X

Train asset

hpo_param

hpo_param is an item that determines the hyper parameter tuning of anomaly detection models.

Argument type: Custom
Input type
- string
Possible values
- False(default)
- True
- False
Usage
- hpo_param: True
ui_args: X

contamination

contamination is an item that sets the range of anomaly ratios. This feature helps maximize model performance by finding the optimal ratio during the hpo process even when the user doesn't know the anomaly ratio, or allows the user to set the model by entering a known ratio.

Argument type: Custom
Input type
- float or list
Possible values
- ''(default)
- [Appropriate positive integer value based on user judgment, Appropriate positive integer value based on user judgment]
- Appropriate positive integer value based on user judgment
Usage
- contamination: [0.001, 0.1]
- contamination: 0.0001
ui_args: X

models

This is an item to select which models to use among the 5 built-in models. If two or more models are selected, the output is provided as an Ensemble result.

Argument type: Custom
Input type
- string select
Possible values
- knn,ocsvm,lof,isf (default)
- knn, ocsvm, lof, isf, dbsacn
Usage
- models:
  - knn
  - ocsvm
  - lof
  - isf
  - dbscan
ui_args: X

visualization

visualization is an item that determines whether to visualize the detection results of anomaly detection models.

Argument type: Custom
Input type
- string
Possible values
- False(default)
- True
- False
Usage
- hpo_param: True
ui_args: X

Inference Asset

none: Does not specify separate settings for inference.

User Arguments Summary

Here's a markdown table including only the items mentioned in the User Arguments detailed explanation:

Asset Name	Argument type	Argument Name	Default	Description	User Setting Required	ui_args
Input	Required	file_type	csv	Enter the file extension of the input data.	X	O
Input	Required	encoding	utf-8	Enter the encoding type of the input data.	X	O
Readiness	Required	x_columns	-	Enter the names of x columns for training.	O	O
Readiness	Required	y_column	-	Enter the name of the y column.	O	O
Readiness	Custom	groupkey_columns	-	Groups the dataframe based on the value of the entered column.	X	O
Readiness	Custom	drop_columns	-	Specifies columns to exclude.	X	X
Readiness	Custom	time_column	-	Specifies the time column.	X	X
Readiness	Custom	concat_dataframes	True	Specifies whether to merge dataframes.	X	X
Preprocess	Custom	handling_missing	See description	Specifies the missing value handling method to apply to columns.	X	X
Preprocess	Custom	handling_scaling_x	standard	Specifies the feature scaling method.	X	X
Preprocess	Custom	drop_duplicate_time	False	Specifies whether to remove duplicate times.	X	X
Preprocess	Custom	difference_interval	0	Specifies whether to perform differencing.	X	X
Train	Required	hpo_param	False	Specifies whether to perform hyperparameter optimization.	X	O
Train	Required	models	Select from knn, ocsvm, lof, isf, dbscan	Specifies which models to use. (Enter "all" to select all models)	X	O
Train	Required	visualization	False	Specifies whether to perform visualization.	X	O

TAD Version: 1.0.0

experimental_plan.yaml Explanation​

1. File Overview​

2. Structure Explanation​

3. External Data Path Settings (external_path)​

4. User Parameter Settings (user_parameters)​

Pipeline & Asset​

User Arguments Explanation​

What are User Arguments?​

Required Arguments​

Custom Arguments​

4.1. Train Pipeline​

4.1.1. Input Asset​

4.1.2. Readiness Asset​

4.1.3. Preprocess Asset​

4.1.3. Train Asset​

4.2. Inference Pipeline​

4.2.1. Input Asset​

4.2.2. Readiness Asset​

4.2.3. Preprocess Asset​

4.2.4. Inference Asset​

5. Detailed Explanation of User Arguments​

Input Asset​

file_type​

encoding​

Readiness Asset​

x_columns​

y_column​

groupkey_columns​

Preprocess asset​

handling_missing​

handling_scaling​

drop_duplicate_time​

difference_interval​

Train asset​

hpo_param​

contamination​

models​

visualization​

Inference Asset​

User Arguments Summary​