Version: Next

AD Parameter

Updated 2024.05.17

Description of experimental_plan.yaml

To apply AI Contents to your data, you need to input data information and the Contents features into the experimental_plan.yaml file. When you install AI Contents in the solution folder, you can find a default experimental_plan.yaml file for each content under the solution folder. By inputting 'data information' and modifying/adding 'user arguments' provided by each asset, you can run ALO to create a data analysis model with your desired settings.

Structure of experimental_plan.yaml

The experimental_plan.yaml file contains various settings required to run ALO. You can use AI Contents immediately by modifying the 'data path' and 'user arguments' sections of these settings.

Input Data Path (`external_path`)

Parameters under external_path are used to specify the file paths for loading and saving files. If save_train_artifacts_path and save_inference_artifacts_path are not provided, the modeling outputs are saved in the default paths train_artifacts and inference_artifacts folders.

external_path:
    - load_train_data_path: ./solution/sample_data/train
    - load_inference_data_path:  ./solution/sample_data/test
    - save_train_artifacts_path:
    - save_inference_artifacts_path:

Parameter Name	Default	Description and Options
load_train_data_path	./sample_data/train/	Enter the folder path where the training data is located (Do not enter the CSV file name)
load_inference_data_path	./sample_data/test/	Enter the folder path where the inference data is located (Do not enter the CSV file name)

User Parameters (`user_parameters`)

user_parameters contain the asset name under step. Below step: input indicates the input asset stage.
args contains user arguments for the input asset (step: input). User arguments are setting parameters for data analysis provided by each asset. For more details, see the User Arguments section below.

user_parameters:
    - train_pipeline:
        - step: input
          args:
            - file_type
            ...
          ui_args:
            ...

Explanation of User Arguments

What are User Arguments?

User arguments are parameters for configuring each asset's operation, entered under args in each asset step of the experimental_plan.yaml file. Each asset in AI Contents' pipeline provides user arguments to apply various features to your data. Users can modify and add user arguments to create a modeling process that fits their data by referring to the guide below.

User arguments are divided into "Mandatory Arguments," which are pre-written in the experimental_plan.yaml file, and "Custom Arguments," which users add by referring to the guide.

Mandatory Arguments

Mandatory arguments are basic arguments that are immediately visible in the experimental_plan.yaml file. Most mandatory arguments have default values. If an argument has a default value, users do not need to set a value for it; it will operate with the default value.
Among the mandatory arguments in the experimental_plan.yaml file, users must set values for data-related arguments (e.g., x_columns, y_column).

Custom Arguments

Custom arguments are not written in the experimental_plan.yaml file but are features provided by the asset that users can add to the experimental_plan.yaml file. Add them under the 'args' of each asset.

AD's pipeline consists of the Input - Readiness - Modeling (train/inference) - Output assets, and user arguments are configured differently for each asset's function. Start with the mandatory user arguments pre-written in the experimental_plan.yaml file, and add user arguments to create an AD model that perfectly fits your data!

Summary of User Arguments

Below is the list of AD's user arguments. Click on the 'Argument Name' to jump to the detailed description of that argument.

Default

The 'Default' column shows the default value of the user argument.
If there is no default value, it is indicated with '-'.
If the default value follows a logic, it is indicated with 'See description'. Click on the 'Argument Name' for detailed description.

ui_args

The 'ui_args' column indicates whether the ui_args feature, which allows changing argument values in the AI Conductor's UI, is supported.
O: Enter the argument name under ui_args in the experimental_plan.yaml to change argument values in the AI Conductor UI.
X: The ui_args feature is not supported.
For more details about ui_args, see the following guide: Write UI Parameter
The AD experimental_plan.yaml includes ui_args_detail for user arguments that can be ui_args.

Mandatory User Configuration

The 'Mandatory User Configuration' column indicates user arguments that must be checked and changed by the user to run AI Contents.
O: Arguments generally related to project and data information that users must check before modeling.
X: If the user does not change the values, modeling proceeds with the default values.

Asset	Argument Type	Argument Name	Default	Description	Mandatory User Configuration	ui_args
Input	Custom	file_type	csv	Enter the file extension of the input data.	X	X
Input	Custom	encoding	utf-8	Enter the encoding type of the input data.	X	X
Readiness	Mandatory	x_columns	-	Enter the columns for anomaly detection.	O	O
Readiness	Mandatory	time_column	-	Enter the column with time values to identify each point.	O	O
Readiness	Mandatory	groupkey	-	If you want to detect anomalies by grouping points, enter the column for group information. Leave blank if not grouping.	O	X
Readiness	Custom	y_columns	-	If labels exist, enter the columns containing label information for each point. If there are multiple x columns, enter the same number of columns as a list.	X	X
Preprocess	Mandatory	handling_missing	drop	Decide how to handle rows with missing values.	O	X
Preprocess	Mandatory	handling_scaling	none	Decide how to scale the data.	O	X
Preprocess	Mandatory	drop_duplicate_time	True	Decide whether to drop rows with duplicate time column values, keeping only one row.	O	X
Train	Mandatory	train_models	[dt,sr,stl_dt,stl_sr]	Enter the models to use for anomaly detection.	O	X
Train	Mandatory	decision_rule	two	Select the direction for anomaly detection. If 'two', detect anomalies above and below the threshold.	O	X
Train	Mandatory	hpo_repeat	20	Decide the number of Bayesian optimization iterations.	O	X
Train	Mandatory	return_all	True	Decide whether to receive results from all models used.	O	X
Train	Mandatory	objective_cal_metric	distance	Decide the score metric for Bayesian optimization.	O	X
Train	Custom	rolling_window	[10, 100, "int"]	Search space for rolling_window parameter in Bayesian optimization of the dynamic_threshold model.	X	X
Train	Custom	threshold_margin	[0.1, 2, "float"]	Search space for threshold_margin parameter in Bayesian optimization of the dynamic_threshold model.	X	X
Train	Custom	ma_es	[1, 2, "int"]	Search space for ma_es parameter in Bayesian optimization of the dynamic_threshold model.	X	X
Train	Custom	window_size_amp	[3, 35, "int"]	Search space for window_size_amp parameter in Bayesian optimization of the spectral_residual model.	X	X
Train	Custom	window_size_score	[40, 300, "int"]	Search space for window_size_score parameter in Bayesian optimization of the spectral_residual model.	X	X
Train	Custom	threshold_level	[99, 99.9, "float"]	Search space for threshold_level parameter in Bayesian optimization of the spectral_residual model.	X	X
Train	Custom	stl_dt_period	[7, 21, "int"]	Search space for stl_dt_period parameter in Bayesian optimization of the stl_dt model.	X	X
Train	Custom	stl_dt_seasonal	[4, 8, "int"]	Search space for stl_dt_seasonal parameter in Bayesian optimization of the stl_dt model.	X	X
Train	Custom	stl_dt_threshold_margin	[1, 3, "float"]	Search space for stl_dt_threshold_margin parameter in Bayesian optimization of the stl_dt model.	X	X
Train	Custom	stl_sr_window_size_amp	[3, 35, "int"]	Search space for stl_sr_window_size_amp parameter in Bayesian optimization of the stl_sr model.	X	X
Train	Custom	stl_sr_window_size_score	[40, 300, "int"]	Search space for stl_sr_window_size_score parameter in Bayesian optimization of the stl_sr model.	X	X
Train	Custom	stl_sr_threshold_level	[99, 99.9, "float"]	Search space for stl_sr_threshold_level parameter in Bayesian optimization of the stl_sr model.	X	X
Train	Custom	stl_sr_period	[7, 21, "int"]	Search space for stl_sr_period parameter in Bayesian optimization of the stl_sr model.	X	X
Train	Custom	stl_sr_seasonal	[4, 8, "int"]	Search space for stl_sr_seasonal parameter in Bayesian optimization of the stl_sr model.	X	X
Inference	Mandatory	model_select	all	Decide which model to use for inference.	O	X

Detailed Description of User Arguments

Input asset

file_type

Enter the file extension of the input data. Currently, AI Solution development only supports CSV files.

Argument Type: Mandatory
Input Type
- string
Input Values
- csv (default)
Usage
- file_type: csv
ui_args: X

encoding

Enter the encoding type of the input data. Currently, AI Solution development only supports UTF-8 encoding.

Argument Type: Mandatory
Input Type
- string
Input Values
- utf-8 (default)
Usage
- encoding: utf-8
ui_args: X

Readiness asset

y_columns

Enter the columns containing label information for each point for anomaly detection. AD does not require labels by default, so enter this only if you want results using labels. Supports multiple columns, in which case, there must be columns for each x column.

Argument Type: Custom
Input Type
- list
Input Values
- Column names
Usage
- y_columns : [y_col1,y_col2]
ui_args: X

groupkey

Enter the column with group information if you want to detect anomalies by grouping points. Leave blank if not grouping. Currently supports a single groupkey column.

Argument Type: Mandatory
Input Type
- list
Input Values
- Column name
Usage
- groupkey : [groupkey_col_example]
ui_args: X

x_columns

Enter the columns containing data for anomaly detection. Supports multiple columns.

Argument Type: Mandatory
Input Type
- list
Input Values
- Column names
Usage
- x_columns : [x_col1,x_col2]
ui_args: O

time_column

Enter the column with time values to identify each point for anomaly detection.

Argument Type: Mandatory
Input Type
- string
Input Values
- Column name
Usage
- time_column : [time_col_example]
ui_args: O

Preprocess asset

handling_missing

Decide how to handle rows with missing values for anomaly detection. 'drop' removes the row, 'most_frequent' fills with the most frequent value, 'mean' fills with the mean value, 'median' fills with the median value, and 'interpolation' fills with the interpolated value from adjacent points.

Argument Type: Mandatory
Input Type
- string
Input Values
- drop (default)
- drop
- most_frequent
- mean
- median
- interpolation
Usage
- handling_missing : drop
ui_args: X

handling_scaling

Decide how to scale the data for anomaly detection. 'standard' uses the mean and std of the train data to scale to mean 0 and variance 1. 'minmax' scales values to 0-1 using the min and max values of the train data. 'maxabs' scales values to 0-1 using the absolute max value of the train data. 'robust' scales values using the median and IQR of the train data. 'normalizer' scales the length of feature vectors to 1. If not specified, no scaling is performed.

Argument Type: Mandatory
Input Type
- string
Input Values
- none(default)
- standard
- minmax
- maxabs
- robust
- normalizer
Usage
- handling_scaling : minmax
ui_args: X

drop_duplicate_time

Decide how to handle rows with duplicate time column values for anomaly detection. 'True' keeps only one row and removes duplicates.

Argument Type: Mandatory
Input Type
- string
Input Values
- True (default)
- True
- False
Usage
- drop_duplicate_time : True
ui_args: X

Train asset

train_models

The data for anomaly detection will be analyzed using one or more of the four available models. If 'dt' is selected, the 'dynamic_threshold' model will be used. For 'sr', the 'spectral_residual' model will be used. Additionally, 'stl_dt' and 'stl_sr' models are also available. Please input the models you wish to use in the form of a list. If no specific models are input, all four models will be used by default.

Argument Type: Mandatory
Input Type
- string
Input Values
- [dt,sr,stl_dt,stl_sr] (default)
- [models that users want to use]
Usage
- train_models: [dt,sr]
ui_args: X

decision_rule

Decide the direction for anomaly detection. 'upper' detects points above the threshold, 'lower' detects points below the threshold, and 'two' detects points above and below the threshold.

Argument Type: Mandatory
Input Type
- string
Input Values
- two (default)
- upper
- lower
- two
Usage
- decision_rule: two
ui_args: X

hpo_repeat

Decide the number of Bayesian optimization iterations for finding the optimal parameters during anomaly detection. If set to 0, Bayesian optimization is not performed, and parameters are randomly selected from the search space.

Argument Type: Mandatory
Input Type
- int
Input Values
- 20 (default)
Usage
- hpo_repeat: 20
ui_args: X

return_all

Decide whether to return results from all models used for anomaly detection. If 'False', only the results of the best model are returned.

Argument Type: Mandatory
Input Type
- Boolean
Input Values
- True (default)
- True
- False
Usage
- return_all: True
ui_args: X

objective_cal_metric

Decide the score metric for Bayesian optimization during anomaly detection. 'distance' uses a predefined metric to maximize the distance between the distribution of OK and NG data points. If y columns are present, 'precision', 'recall', or 'f1' can be used to directly maximize these scores.

Argument Type: Mandatory
Input Type
- string
Input Values
- distance (default)
- distance
- precision (available only if y columns are present)
- recall (available only if y columns are present)
- f1 (available only if y columns are present)
Usage
- objective_cal_metric: distance
ui_args: X

rolling_window

Search space for the rolling_window parameter in the Bayesian optimization of the dynamic_threshold model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [10,100,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- rolling_window: [10,100,'int']
ui_args: X

threshold_margin

Search space for the threshold_margin parameter in the Bayesian optimization of the dynamic_threshold model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [0.1,2,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
Usage
- threshold_margin

: [0.1,2,'float']

ui_args: X

ma_es

Search space for the ma_es parameter in the Bayesian optimization of the dynamic_threshold model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [1,2,'int'] (default)
- 1 or 2 (1 selects simple moving average, 2 selects exponential moving average)
Usage
- ma_es : [1,2,'int']
ui_args: X

window_size_amp

Search space for the window_size_amp parameter in the Bayesian optimization of the spectral_residual model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [3,35,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- window_size_amp : [3,35,'int']
ui_args: X

window_size_score

Search space for the window_size_score parameter in the Bayesian optimization of the spectral_residual model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [40,300,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- window_size_score : [40,300,'int']
ui_args: X

threshold_level

Search space for the threshold_level parameter in the Bayesian optimization of the spectral_residual model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [99,99.9,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
Usage
- threshold_level : [99,99.9,'float']
ui_args: X

stl_dt_period

Search space for the stl_dt_period parameter in the Bayesian optimization of the stl_dt model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [7,21,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- stl_dt_period : [7,21,'int']
ui_args: X

stl_dt_seasonal

Search space for the stl_dt_seasonal parameter in the Bayesian optimization of the stl_dt model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [4,8,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- stl_dt_seasonal : [4,8,'int']
ui_args: X

stl_dt_threshold_margin

Search space for the stl_dt_threshold_margin parameter in the Bayesian optimization of the stl_dt model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [1,3,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
Usage
- stl_dt_threshold_margin : [1,3,'float']
ui_args: X

stl_sr_window_size_amp

Search space for the stl_sr_window_size_amp parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [3,35,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- stl_sr_window_size_amp : [3,35,'int']
ui_args: X

stl_sr_window_size_score

Search space for the stl_sr_window_size_score parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [40,300,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- stl_sr_window_size_score : [40,300,'int']
ui_args: X

stl_sr_threshold_level

Search space for the stl_sr_threshold_level parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [99,99.9,'float'] (default)
- [appropriate positive float, appropriate positive float, 'float']
- appropriate positive float
Usage
- stl_sr_threshold_level : [99,99.9,'float']
ui_args: X

stl_sr_period

Search space for the stl_sr_period parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization.

Argument Type: Custom
Input Type
- list
Input Values
- [7,21,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- stl_sr_period : [7,21,'int']
ui_args: X

stl_sr_seasonal

Search space for the stl_sr_seasonal parameter in the Bayesian optimization of the stl_sr model. Enter the min value, max value, and value type (int or float) in a list. Only enter if you want to adjust the search space directly. If not entered, the default search space is used. Enter a single value to fix the parameter, skipping Bayesian optimization. The stl_sr_seasonal value should be an odd number, so if the entered value is n, the actual applied value is 2*n-1.

Argument Type: Custom
Input Type
- list
Input Values
- [4,8,'int'] (default)
- [appropriate positive integer, appropriate positive integer, 'int']
- appropriate positive integer
Usage
- stl_sr_seasonal : [4,8,'int']
ui_args: X

Inference asset

model_select

Decide which model to use for inference. 'best' uses the optimal model, entering a specific model name uses that model for inference. 'all' returns inference results from all models.

Argument Type: Mandatory
Input Type
string
Input Values
- all (default)
- dt
- sr
- stl_dt
- stl_sr
- all
- best
Usage
- model_select: all
ui_args: X

AD Version: 2.0.1

Description of experimental_plan.yaml​

Structure of experimental_plan.yaml​

Input Data Path (external_path)​

User Parameters (user_parameters)​

Explanation of User Arguments​

What are User Arguments?​

Mandatory Arguments​

Custom Arguments​

Summary of User Arguments​

Default​

ui_args​

Mandatory User Configuration​

Detailed Description of User Arguments​

Input asset​

file_type​

encoding​

Readiness asset​

y_columns​

groupkey​

x_columns​

time_column​

Preprocess asset​

handling_missing​

handling_scaling​

drop_duplicate_time​

Train asset​

train_models​

decision_rule​

hpo_repeat​

return_all​

objective_cal_metric​

rolling_window​

threshold_margin​

ma_es​

window_size_amp​

window_size_score​

threshold_level​

stl_dt_period​

stl_dt_seasonal​

stl_dt_threshold_margin​

stl_sr_window_size_amp​

stl_sr_window_size_score​

stl_sr_threshold_level​

stl_sr_period​

stl_sr_seasonal​

Inference asset​

model_select​