AD Parameter
Description of experimental_plan.yaml
To apply AI Contents to your data, you need to input data information and the Contents features into the experimental_plan.yaml file. When you install AI Contents in the solution folder, you can find a default experimental_plan.yaml file for each content under the solution folder. By inputting 'data information' and modifying/adding 'user arguments' provided by each asset, you can run ALO to create a data analysis model with your desired settings.
Structure of experimental_plan.yaml
The experimental_plan.yaml file contains various settings required to run ALO. You can use AI Contents immediately by modifying the 'data path' and 'user arguments' sections of these settings.
Input Data Path (external_path
)
- Parameters under
external_path
are used to specify the file paths for loading and saving files. Ifsave_train_artifacts_path
andsave_inference_artifacts_path
are not provided, the modeling outputs are saved in the default pathstrain_artifacts
andinference_artifacts
folders.
external_path:
- load_train_data_path: ./solution/sample_data/train
- load_inference_data_path: ./solution/sample_data/test
- save_train_artifacts_path:
- save_inference_artifacts_path:
Parameter Name | Default | Description and Options |
---|---|---|
load_train_data_path | ./sample_data/train/ | Enter the folder path where the training data is located (Do not enter the CSV file name) |
load_inference_data_path | ./sample_data/test/ | Enter the folder path where the inference data is located (Do not enter the CSV file name) |
User Parameters (user_parameters
)
user_parameters
contain the asset name understep
. Belowstep: input
indicates the input asset stage.args
contains user arguments for the input asset (step: input
). User arguments are setting parameters for data analysis provided by each asset. For more details, see the User Arguments section below.
user_parameters:
- train_pipeline:
- step: input
args:
- file_type
...
ui_args:
...
Explanation of User Arguments
What are User Arguments?
User arguments are parameters for configuring each asset's operation, entered under args
in each asset step of the experimental_plan.yaml file. Each asset in AI Contents' pipeline provides user arguments to apply various features to your data. Users can modify and add user arguments to create a modeling process that fits their data by referring to the guide below.
User arguments are divided into "Mandatory Arguments," which are pre-written in the experimental_plan.yaml file, and "Custom Arguments," which users add by referring to the guide.
Mandatory Arguments
- Mandatory arguments are basic arguments that are immediately visible in the experimental_plan.yaml file. Most mandatory arguments have default values. If an argument has a default value, users do not need to set a value for it; it will operate with the default value.
- Among the mandatory arguments in the experimental_plan.yaml file, users must set values for data-related arguments (e.g., x_columns, y_column).
Custom Arguments
- Custom arguments are not written in the experimental_plan.yaml file but are features provided by the asset that users can add to the experimental_plan.yaml file. Add them under the 'args' of each asset.
AD's pipeline consists of the Input - Readiness - Modeling (train/inference) - Output assets, and user arguments are configured differently for each asset's function. Start with the mandatory user arguments pre-written in the experimental_plan.yaml file, and add user arguments to create an AD model that perfectly fits your data!
Summary of User Arguments
Below is the list of AD's user arguments. Click on the 'Argument Name' to jump to the detailed description of that argument.
Default
- The 'Default' column shows the default value of the user argument.
- If there is no default value, it is indicated with '-'.
- If the default value follows a logic, it is indicated with 'See description'. Click on the 'Argument Name' for detailed description.
ui_args
- The 'ui_args' column indicates whether the
ui_args
feature, which allows changing argument values in the AI Conductor's UI, is supported. - O: Enter the argument name under
ui_args
in the experimental_plan.yaml to change argument values in the AI Conductor UI. - X: The
ui_args
feature is not supported. - For more details about
ui_args
, see the following guide: Write UI Parameter - The AD experimental_plan.yaml includes
ui_args_detail
for user arguments that can beui_args
.
Mandatory User Configuration
- The 'Mandatory User Configuration' column indicates user arguments that must be checked and changed by the user to run AI Contents.
- O: Arguments generally related to project and data information that users must check before modeling.
- X: If the user does not change the values, modeling proceeds with the default values.
Asset | Argument Type | Argument Name | Default | Description | Mandatory User Configuration | ui_args |
---|---|---|---|---|---|---|
Input | Custom | file_type | csv | Enter the file extension of the input data. | X | X |
Input | Custom | encoding | utf-8 | Enter the encoding type of the input data. | X | X |
Readiness | Mandatory | x_columns | - | Enter the columns for anomaly detection. | O | O |
Readiness | Mandatory | time_column | - | Enter the column with time values to identify each point. | O | O |
Readiness | Mandatory | groupkey | - | If you want to detect anomalies by grouping points, enter the column for group information. Leave blank if not grouping. | O | X |
Readiness | Custom | y_columns | - | If labels exist, enter the columns containing label information for each point. If there are multiple x columns, enter the same number of columns as a list. | X | X |
Preprocess | Mandatory | handling_missing | drop | Decide how to handle rows with missing values. | O | X |
Preprocess | Mandatory | handling_scaling | none | Decide how to scale the data. | O | X |
Preprocess | Mandatory | drop_duplicate_time | True | Decide whether to drop rows with duplicate time column values, keeping only one row. | O | X |
Train | Mandatory | train_models | [dt,sr,stl_dt,stl_sr] | Enter the models to use for anomaly detection. | O | X |
Train | Mandatory | decision_rule | two | Select the direction for anomaly detection. If 'two', detect anomalies above and below the threshold. | O | X |
Train | Mandatory | hpo_repeat | 20 | Decide the number of Bayesian optimization iterations. | O | X |
Train | Mandatory | return_all | True | Decide whether to receive results from all models used. | O | X |
Train | Mandatory | objective_cal_metric | distance | Decide the score metric for Bayesian optimization. | O | X |
Train | Custom | rolling_window | [10, 100, "int"] | Search space for rolling_window parameter in Bayesian optimization of the dynamic_threshold model. | X | X |
Train | Custom | threshold_margin | [0.1, 2, "float"] | Search space for threshold_margin parameter in Bayesian optimization of the dynamic_threshold model. | X | X |
Train | Custom | ma_es | [1, 2, "int"] | Search space for ma_es parameter in Bayesian optimization of the dynamic_threshold model. | X | X |
Train | Custom | window_size_amp | [3, 35, "int"] | Search space for window_size_amp parameter in Bayesian optimization of the spectral_residual model. | X | X |
Train | Custom | window_size_score | [40, 300, "int"] | Search space for window_size_score parameter in Bayesian optimization of the spectral_residual model. | X | X |
Train | Custom | threshold_level | [99, 99.9, "float"] | Search space for threshold_level parameter in Bayesian optimization of the spectral_residual model. | X | X |
Train | Custom | stl_dt_period | [7, 21, "int"] | Search space for stl_dt_period parameter in Bayesian optimization of the stl_dt model. | X | X |
Train | Custom | stl_dt_seasonal | [4, 8, "int"] | Search space for stl_dt_seasonal parameter in Bayesian optimization of the stl_dt model. | X | X |
Train | Custom | stl_dt_threshold_margin | [1, 3, "float"] | Search space for stl_dt_threshold_margin parameter in Bayesian optimization of the stl_dt model. | X | X |
Train | Custom | stl_sr_window_size_amp | [3, 35, "int"] | Search space for stl_sr_window_size_amp parameter in Bayesian optimization of the stl_sr model. | X | X |
Train | Custom | stl_sr_window_size_score | [40, 300, "int"] | Search space for stl_sr_window_size_score parameter in Bayesian optimization of the stl_sr model. | X | X |
Train | Custom | stl_sr_threshold_level | [99, 99.9, "float"] | Search space for stl_sr_threshold_level parameter in Bayesian optimization of the stl_sr model. | X | X |
Train | Custom | stl_sr_period | [7, 21, "int"] | Search space for stl_sr_period parameter in Bayesian optimization of the stl_sr model. | X | X |
Train | Custom | stl_sr_seasonal | [4, 8, "int"] | Search space for stl_sr_seasonal parameter in Bayesian optimization of the stl_sr model. | X | X |
Inference | Mandatory | model_select | all | Decide which model to use for inference. | O | X |
Detailed Description of User Arguments
Input asset
file_type
Enter the file extension of the input data. Currently, AI Solution development only supports CSV files.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- csv (default)
- Usage
- file_type: csv
- ui_args: X
encoding
Enter the encoding type of the input data. Currently, AI Solution development only supports UTF-8 encoding.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- utf-8 (default)
- Usage
- encoding: utf-8
- ui_args: X
Readiness asset
y_columns
Enter the columns containing label information for each point for anomaly detection. AD does not require labels by default, so enter this only if you want results using labels. Supports multiple columns, in which case, there must be columns for each x column.
- Argument Type: Custom
- Input Type
- list
- Input Values
- Column names
- Usage
- y_columns : [y_col1,y_col2]
- ui_args: X
groupkey
Enter the column with group information if you want to detect anomalies by grouping points. Leave blank if not grouping. Currently supports a single groupkey column.
- Argument Type: Mandatory
- Input Type
- list
- Input Values
- Column name
- Usage
- groupkey : [groupkey_col_example]
- ui_args: X
x_columns
Enter the columns containing data for anomaly detection. Supports multiple columns.
- Argument Type: Mandatory
- Input Type
- list
- Input Values
- Column names
- Usage
- x_columns : [x_col1,x_col2]
- ui_args: O
time_column
Enter the column with time values to identify each point for anomaly detection.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- Column name
- Usage
- time_column : [time_col_example]
- ui_args: O
Preprocess asset
handling_missing
Decide how to handle rows with missing values for anomaly detection. 'drop' removes the row, 'most_frequent' fills with the most frequent value, 'mean' fills with the mean value, 'median' fills with the median value, and 'interpolation' fills with the interpolated value from adjacent points.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- drop (default)
- drop
- most_frequent
- mean
- median
- interpolation
- Usage
- handling_missing : drop
- ui_args: X
handling_scaling
Decide how to scale the data for anomaly detection. 'standard' uses the mean and std of the train data to scale to mean 0 and variance 1. 'minmax' scales values to 0-1 using the min and max values of the train data. 'maxabs' scales values to 0-1 using the absolute max value of the train data. 'robust' scales values using the median and IQR of the train data. 'normalizer' scales the length of feature vectors to 1. If not specified, no scaling is performed.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- none(default)
- standard
- minmax
- maxabs
- robust
- normalizer
- Usage
- handling_scaling : minmax
- ui_args: X
drop_duplicate_time
Decide how to handle rows with duplicate time column values for anomaly detection. 'True' keeps only one row and removes duplicates.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- True (default)
- True
- False
- Usage
- drop_duplicate_time : True
- ui_args: X
Train asset
train_models
The data for anomaly detection will be analyzed using one or more of the four available models. If 'dt' is selected, the 'dynamic_threshold' model will be used. For 'sr', the 'spectral_residual' model will be used. Additionally, 'stl_dt' and 'stl_sr' models are also available. Please input the models you wish to use in the form of a list. If no specific models are input, all four models will be used by default.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- [dt,sr,stl_dt,stl_sr] (default)
- [models that users want to use]
- Usage
- train_models: [dt,sr]
- ui_args: X
decision_rule
Decide the direction for anomaly detection. 'upper' detects points above the threshold, 'lower' detects points below the threshold, and 'two' detects points above and below the threshold.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- two (default)
- upper
- lower
- two
- Usage
- decision_rule: two
- ui_args: X
hpo_repeat
Decide the number of Bayesian optimization iterations for finding the optimal parameters during anomaly detection. If set to 0, Bayesian optimization is not performed, and parameters are randomly selected from the search space.
- Argument Type: Mandatory
- Input Type
- int
- Input Values
- 20 (default)
- Usage
- hpo_repeat: 20
- ui_args: X
return_all
Decide whether to return results from all models used for anomaly detection. If 'False', only the results of the best model are returned.
- Argument Type: Mandatory
- Input Type
- Boolean
- Input Values
- True (default)
- True
- False
- Usage
- return_all: True
- ui_args: X
objective_cal_metric
Decide the score metric for Bayesian optimization during anomaly detection. 'distance' uses a predefined metric to maximize the distance between the distribution of OK and NG data points. If y columns are present, 'precision', 'recall', or 'f1' can be used to directly maximize these scores.
- Argument Type: Mandatory
- Input Type
- string
- Input Values
- distance (default)
- distance
- precision (available only if y columns are present)
- recall (available only if y columns are present)
- f1 (available only if y columns are present)
- Usage
- objective_cal_metric: distance
- ui_args: X
rolling_window
Search space for the rolling_window parameter in the Bayesian optimization of the dynamic_threshold model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [10,100,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- rolling_window: [10,100,'int']
- ui_args: X
threshold_margin
Search space for the threshold_margin parameter in the Bayesian optimization of the dynamic_threshold model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [0.1,2,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
- Usage
- threshold_margin
: [0.1,2,'float']
- ui_args: X
ma_es
Search space for the ma_es parameter in the Bayesian optimization of the dynamic_threshold model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [1,2,'int'] (default)
- 1 or 2 (1 selects simple moving average, 2 selects exponential moving average)
- Usage
- ma_es : [1,2,'int']
- ui_args: X
window_size_amp
Search space for the window_size_amp parameter in the Bayesian optimization of the spectral_residual model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [3,35,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- window_size_amp : [3,35,'int']
- ui_args: X
window_size_score
Search space for the window_size_score parameter in the Bayesian optimization of the spectral_residual model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [40,300,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- window_size_score : [40,300,'int']
- ui_args: X
threshold_level
Search space for the threshold_level parameter in the Bayesian optimization of the spectral_residual model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [99,99.9,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
- Usage
- threshold_level : [99,99.9,'float']
- ui_args: X
stl_dt_period
Search space for the stl_dt_period parameter in the Bayesian optimization of the stl_dt model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [7,21,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- stl_dt_period : [7,21,'int']
- ui_args: X
stl_dt_seasonal
Search space for the stl_dt_seasonal parameter in the Bayesian optimization of the stl_dt model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [4,8,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- stl_dt_seasonal : [4,8,'int']
- ui_args: X
stl_dt_threshold_margin
Search space for the stl_dt_threshold_margin parameter in the Bayesian optimization of the stl_dt model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [1,3,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
- Usage
- stl_dt_threshold_margin : [1,3,'float']
- ui_args: X
stl_sr_window_size_amp
Search space for the stl_sr_window_size_amp parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [3,35,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- stl_sr_window_size_amp : [3,35,'int']
- ui_args: X
stl_sr_window_size_score
Search space for the stl_sr_window_size_score parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [40,300,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- stl_sr_window_size_score : [40,300,'int']
- ui_args: X
stl_sr_threshold_level
Search space for the stl_sr_threshold_level parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [99,99.9,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
- Usage
- stl_sr_threshold_level : [99,99.9,'float']
- ui_args: X
stl_sr_period
Search space for the stl_sr_period parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [7,21,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- stl_sr_period : [7,21,'int']
- ui_args: X
stl_sr_seasonal
Search space for the stl_sr_seasonal parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization. The stl_sr_seasonal value should be an odd number, so if the entered value is n, the actual applied value is 2*n-1.
- Argument Type: Custom
- Input Type
- list
- Input Values
- [4,8,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
- Usage
- stl_sr_seasonal : [4,8,'int']
- ui_args: X
Inference asset
model_select
Decide which model to use for inference. 'best' uses the optimal model, entering a specific model name uses that model for inference. 'all' returns inference results from all models.
-
Argument Type: Mandatory
-
Input Type
-
string
-
Input Values
- all (default)
- dt
- sr
- stl_dt
- stl_sr
- all
- best
-
Usage
- model_select: all
-
ui_args: X
AD Version: 2.0.1