Skip to main content
Version: Next

TAD Parameter

Updated 2024.07.06

experimental_plan.yaml Explanation

  • To apply AI Contents to your data, you need to enter information about the data and the Contents features you want to use in the experimental_plan.yaml file.

  • When you install AI Contents in the solution folder, you can find the basic experimental_plan.yaml file written for each content under the solution folder.

  • By entering data information in this YAML file and modifying/adding user arguments provided by each asset, you can create a data analysis model with the desired settings when running ALO.

1. File Overview


experimental_plan.yaml is a configuration file that defines the experiment plan for TAD, including the data path and parameters to be used in various pipeline stages. This file allows you to automate the data preprocessing, model training, and deployment process.

2. Structure Explanation


experimental_plan.yaml consists of the following main sections:

  • External data path settings (external_path)
  • User parameter settings (user_parameters)

3. External Data Path Settings (external_path)


Specifies the paths for loading data or saving results.

  • load_train_data_path: Specifies the path to load training data.
  • load_inference_data_path: Specifies the path to load inference data.
  • save_train_artifacts_path: Specifies the path to save training results.
  • save_inference_artifacts_path: Specifies the path to save inference results.
  • load_model_path: Specifies the path to load an existing model.
external_path:
- load_train_data_path: ./solution/sample_data/train/
- load_inference_data_path: ./solution/sample_data/test/
- save_train_artifacts_path:
- save_inference_artifacts_path:
- load_model_path:
Parameter NameDEFAULTDescription and Options
load_train_data_path./sample_data/train/Specifies the path to load training data. (Do not enter csv file name) All csv files under the entered path will be concatenated.
load_inference_data_path./sample_data/test/Specifies the path to load inference data. (Do not enter csv file name) All csv files under the entered path will be concatenated.
save_train_artifacts_path-Specifies the path to save training results.
save_inference_artifacts_path-Specifies the path to save inference results.
load_model_path-Specifies the path to load an existing model.

All files in subfolders under the entered path will also be combined.

All column names of the files to be combined must be the same.

4. User Parameter Settings (user_parameters)


Pipeline & Asset

user_parameters defines the configuration parameters to be used in each pipeline stage. Each pipeline is divided into train_pipeline and inference_pipeline, and each pipeline consists of several stages (Assets). Each Asset performs a specific data processing task and has various parameters that control that task.

  • Pipeline: The higher concept of data processing flow, consisting of several stages (Assets).
  • Asset: A unit that performs individual tasks within the pipeline. For example, data preprocessing, model training, etc.
  • args: Parameters that configure the operation of each Asset.
  • ui_args: Defines parameters that users can change in the AI Conductor UI.

User Arguments Explanation

What are User Arguments?

User arguments are parameters for setting the operation of each asset, used by entering them under args of each asset step in experimental_plan.yaml. AI Contents provides user arguments for each asset that makes up the pipeline so that users can apply various functions to their data. Users can refer to the guide below to change and add user arguments to model their data appropriately. User arguments are divided into "Required arguments" that are pre-written in experimental_plan.yaml and "Custom arguments" that users add by referring to the guide.

Required Arguments

  • Required arguments are the basic arguments that are immediately visible in experimental_plan.yaml. Most required arguments have built-in default values. For arguments with default values, they will operate with the default value even if the user does not set a value separately.
  • Among the required arguments in experimental_plan.yaml, users must set values for data-related arguments. (ex. x_columns, y_column)

Custom Arguments

  • Custom arguments are not written in experimental_plan.yaml but are functions provided by the asset that users can add to experimental_plan.yaml and use. They are used by adding them to 'args' for each asset.

TAD's pipeline is composed of Input - Readiness - Preprocess - Modeling(train/inference) assets in order, and user arguments are configured differently according to the function of each asset. First, try using the required user arguments written in experimental_plan.yaml, and then add user arguments to create a TAD model that perfectly fits your data!

4.1. Train Pipeline

Defines the settings needed for the training pipeline.

4.1.1. Input Asset

Defines settings related to the input path of training data.

- step: input
args:
- file_type: csv
encoding: utf-8
ui_args:

4.1.2. Readiness Asset

Defines the columns of training data.

- step: readiness
args:
- x_columns: [factor0, factor1, factor2, ..]
y_column: ''
groupkey_columns: ''
ui_args:
- x_columns:
- y_column:

4.1.3. Preprocess Asset

Defines data preprocessing settings.

- step: preprocess
args:
- handling_missing: fill_0
handling_scaling_x: standard
drop_duplicate_time: False
handling_downsampling_interval: 0
downsampling_method: median
difference_interval: 0

ui_args:
- handling_missing
- handling_scaling_x

4.1.3. Train Asset

Defines settings related to model training.

- step: train
args:
- hpo_param: False
contamination: ''
models:
- knn
- dbscan
- ocsvm
- lof
- isf
visualization: False

ui_args:
- hpo_param
- contamination
- models

4.2. Inference Pipeline

Defines the settings needed for the inference pipeline.

4.2.1. Input Asset

Defines settings related to the input path of inference data.

- step: input
args:
- none:

4.2.2. Readiness Asset

Defines settings related to the input path of inference data.

- step: readiness
args:
- none:

4.2.3. Preprocess Asset

Defines preprocessing settings for inference data.

- step: preprocess
args:
- none:

4.2.4. Inference Asset

Defines settings for performing inference using the model.

- step: inference
args:
- none:

5. Detailed Explanation of User Arguments


Input Asset

file_type

Enter the file extension of the Input data. Currently, AI Solution development is only possible with csv files.

  • Argument type: Required
  • Input type
    • string
  • Possible values
    • csv (default)
  • Usage
    • file_type: csv
  • ui_args: X

encoding

Enter the encoding type of the Input data. Currently, AI Solution development is only possible with utf-8 encoding.

  • Argument type: Required
  • Input type
    • string
  • Possible values
    • utf-8 (default)
  • Usage
    • encoding: utf-8
  • ui_args: X

Readiness Asset

x_columns

Enter the columns containing the data you want to use for anomaly detection. Multiple columns are supported.

  • Argument type: Required
  • Input type
    • list
  • Possible values
    • Column names
  • Usage
    • x_columns : [ x_col1, x_col2 ]
  • ui_args: O

y_column

Enter the column containing information about which label each data point belongs to for anomaly detection. Since TAD basically does not require labels, enter this only if you want to get results using labels. The number of unique values should be less than 3.

  • Argument type: Custom
  • Input type
    • string
  • Possible values
    • Column name
  • Usage
    • y_column : y_col
  • ui_args: X

groupkey_columns

Enter the column containing information about which group each data point belongs to if you want to perform anomaly detection by group. If you don't want to proceed by group, leave it blank. Currently supports one group key column.

  • Argument type: Required
  • Input type
    • list
  • Possible values
    • Column name
  • Usage
    • groupkey_columns : [ groupkey_col_example ]
  • ui_args: X

Preprocess asset

handling_missing

Determines how to handle missing values in the data you want to perform anomaly detection on. If 'drop', it removes the corresponding row. 'most_frequent' fills with the mode, 'mean' with the average, 'median' with the median, and 'interpolation' with the interpolation value of the previous and next values.

  • Argument type: Custom
  • Input type
    • string
  • Possible values
    • drop (default)
    • drop
    • most_frequent
    • mean
    • median
    • interpolation
  • Usage
    • handling_missing : drop
  • ui_args: X

handling_scaling

Determines how to scale the data you want to perform anomaly detection on. If 'standard', it scales using the mean and std of the train data to have mean 0 and variance 1. If 'minmax', it adjusts the values to be between 0 and 1 using the min and max values of the train data. If 'maxabs', it adjusts the values to be between 0 and 1 using the maximum absolute value of the train data. If 'robust', it scales using the median and quartile values of the train data. If 'normalizer', it scales so that the length of the feature vector of the data becomes 1. If nothing is entered, no separate scaling is performed.

  • Argument type: Custom
  • Input type
    • string
  • Possible values
    • none (default)
    • standard
    • minmax
    • maxabs
    • robust
    • normalizer
  • Usage
    • handling_scaling : minmax
  • ui_args: X

drop_duplicate_time

Determines how to handle duplicate rows in the time column of the data you want to perform anomaly detection on. If True, it removes all but one of the rows with duplicate time columns.

  • Argument type: Custom
  • Input type
    • string
  • Possible values
    • True (default)
    • True
    • False
  • Usage
    • drop_duplicate_time : True
  • ui_args: X

difference_interval

This item determines whether to perform differencing on the data you want to perform anomaly detection on. It is used when you want to perform anomaly detection based on how much the difference between previous values and the current point has changed. When used, the value should be entered as a positive integer greater than 0.

  • Argument type: Custom
  • Input type
    • int
  • Possible values
    • 0 (default)
    • Positive integer 0 or greater
  • Usage
    • difference_interval : 1
  • ui_args: X

Train asset

hpo_param

hpo_param is an item that determines the hyper parameter tuning of anomaly detection models.

  • Argument type: Custom
  • Input type
    • string
  • Possible values
    • False(default)
    • True
    • False
  • Usage
    • hpo_param: True
  • ui_args: X

contamination

contamination is an item that sets the range of anomaly ratios. This feature helps maximize model performance by finding the optimal ratio during the hpo process even when the user doesn't know the anomaly ratio, or allows the user to set the model by entering a known ratio.

  • Argument type: Custom
  • Input type
    • float or list
  • Possible values
    • ''(default)
    • [Appropriate positive integer value based on user judgment, Appropriate positive integer value based on user judgment]
    • Appropriate positive integer value based on user judgment
  • Usage
    • contamination: [0.001, 0.1]
    • contamination: 0.0001
  • ui_args: X

models

This is an item to select which models to use among the 5 built-in models. If two or more models are selected, the output is provided as an Ensemble result.

  • Argument type: Custom
  • Input type
    • string select
  • Possible values
    • knn,ocsvm,lof,isf (default)
    • knn, ocsvm, lof, isf, dbsacn
  • Usage
    • models:
      • knn
      • ocsvm
      • lof
      • isf
      • dbscan
  • ui_args: X

visualization

visualization is an item that determines whether to visualize the detection results of anomaly detection models.

  • Argument type: Custom
  • Input type
    • string
  • Possible values
    • False(default)
    • True
    • False
  • Usage
    • hpo_param: True
  • ui_args: X

Inference Asset

  • none: Does not specify separate settings for inference.

User Arguments Summary

Here's a markdown table including only the items mentioned in the User Arguments detailed explanation:

Asset NameArgument typeArgument NameDefaultDescriptionUser Setting Requiredui_args
InputRequiredfile_typecsvEnter the file extension of the input data.XO
InputRequiredencodingutf-8Enter the encoding type of the input data.XO
ReadinessRequiredx_columns-Enter the names of x columns for training.OO
ReadinessRequiredy_column-Enter the name of the y column.OO
ReadinessCustomgroupkey_columns-Groups the dataframe based on the value of the entered column.XO
ReadinessCustomdrop_columns-Specifies columns to exclude.XX
ReadinessCustomtime_column-Specifies the time column.XX
ReadinessCustomconcat_dataframesTrueSpecifies whether to merge dataframes.XX
PreprocessCustomhandling_missingSee descriptionSpecifies the missing value handling method to apply to columns.XX
PreprocessCustomhandling_scaling_xstandardSpecifies the feature scaling method.XX
PreprocessCustomdrop_duplicate_timeFalseSpecifies whether to remove duplicate times.XX
PreprocessCustomdifference_interval0Specifies whether to perform differencing.XX
TrainRequiredhpo_paramFalseSpecifies whether to perform hyperparameter optimization.XO
TrainRequiredmodelsSelect from knn, ocsvm, lof, isf, dbscanSpecifies which models to use. (Enter "all" to select all models)XO
TrainRequiredvisualizationFalseSpecifies whether to perform visualization.XO


TAD Version: 1.0.0