FCST Parameter
Overview of experimental_plan.yaml
To apply AI Contents to your data, you need to write the data information and the functions of the Contents you want to use in the experimental_plan.yaml file. When you install AI Contents in the solution folder, you can find the pre-written experimental_plan.yaml file under the contents folder. By entering 'data information' and modifying/adding 'user arguments' provided by each asset in this yaml file, you can generate a data analysis model with the desired settings using ALO.
Structure of experimental_plan.yaml
The experimental_plan.yaml contains various settings needed to run ALO. By modifying the 'data path' and 'user arguments' parts of these settings, you can use AI Contents immediately.
Entering Data Paths (external_path
)
- The
external_path
parameters specify the paths to load or save files. Ifsave_train_artifacts_path
andsave_inference_artifacts_path
are not specified, the modeling artifacts are saved in the default pathstrain_artifacts
andinference_artifacts
folders.
external_path:
- load_train_data_path: ./solution/sample_data/train
- load_inference_data_path: ./solution/sample_data/test
- save_train_artifacts_path:
- save_inference_artifacts_path:
Parameter Name | DEFAULT | Description and Options |
---|---|---|
load_train_data_path | ./sample_data/train/ | Path to the folder containing training data (no file names) |
load_inference_data_path | ./sample_data/test/ | Path to the folder containing inference data (no file names) |
User Parameters (user_parameters
)
- The
step
underuser_parameters
represents the asset name. Below,step: input
indicates the input asset step. args
represents the user arguments for the input asset (step: input
). User arguments are data analysis-related settings provided by each asset. Refer to the User arguments explanation below for details.
user_parameters:
- train_pipeline:
- step: input
args:
- file_type
...
ui_args:
...
Explanation of User Arguments
What are User Arguments?
User arguments are parameters for each asset's operation, written under args
in each asset step of experimental_plan.yaml. AI Contents provide user arguments to apply various functions to the data. Users can modify and add user arguments to perform modeling that suits their data.
User arguments are divided into "Required arguments," which are pre-written in experimental_plan.yaml, and "custom arguments," which users can add by referring to the guide provided by each asset.
Required Arguments
- Required arguments are basic arguments that are immediately visible in experimental_plan.yaml. Most Required arguments have default values. If default values exist, the user does not need to set values separately for the arguments to operate with the default values.
- Among the Required arguments in experimental_plan.yaml, data-related arguments must be set by the user. (ex. x_columns, y_column)
Custom Arguments
- Custom arguments are not written in experimental_plan.yaml but are functionalities provided by the asset. Users can add these arguments under 'args' of each asset in experimental_plan.yaml.
The FCST pipeline consists of Input - Readiness - Preprocess - Modeling(train/inference) - Output assets, with user arguments structured differently according to each asset's function. First, use the Required user arguments written in experimental_plan.yaml, and then add user arguments to create an FCST model tailored to your data!
Summary of User Arguments
Below is a summary of the user arguments for FCST. Click on the 'Argument Name' to go to the detailed explanation of the respective arguments.
Default
- The 'Default' field indicates the default value of the respective user argument.
- If there is no default value, it is indicated as '-'.
- If there is logic in the default, it is indicated as 'Refer to the explanation'. Click on the 'Argument Name' for detailed explanation.
ui_args
- The 'ui_args' column in the table below indicates whether the
ui_args
function, which allows changing argument values in the AI Conductor UI, is supported. - O: If you enter the argument name under
ui_args
in experimental_plan.yaml, you can change the argument values in the AI Conductor UI. - X:
ui_args
functionality is not supported. - For detailed explanation of
ui_args
, refer to the guide. Write UI Parameter - The FCST experimental_plan.yaml pre-writes all potential
ui_args
for user arguments underui_args_detail
.
Required User Settings
- The 'Required User Settings' column in the table below indicates whether the user must check and change the user argument for the AI Contents to operate.
- O: These are generally arguments related to the task and data that the user must check before modeling.
- X: If the user does not change the value, modeling proceeds with the default value.
Asset Name | Argument Type | Argument Name | Default | Description | Required User Settings | ui_args |
---|---|---|---|---|---|---|
Input | Custom | file_type | csv | Enter the file extension of the input data. | O | X |
Input | Custom | encoding | utf-8 | Enter the encoding type of the input data. | X | X |
Readiness | Required | y_column | target | Enter the name of the target column to predict. | O | O |
Readiness | Required | time_column | time | Enter the name of the column containing time information. | O | O |
Readiness | Required | time_format | “%Y-%m-%d” | Enter the format of the time information. | O | O |
Readiness | Required | sample_frequency | daily | Enter the frequency of the time information. Available values: yearly, monthly, weekly, daily, hourly, minutely, secondly | O | O |
Readiness | Required | input_chunk_length | 6 | Enter the length of the input time series for the model. Please enter the value based on the unit set in sample_frequency. | O | O |
Readiness | Required | forecast_periods | 3 | Enter the length of the time series to predict for the model. Please enter the value based on the unit set in sample_frequency. | O | O |
Readiness | Custom | groupkey_column | None | Enter the name of the column containing group key information, if available. | X | X |
Readiness | Custom | x_covariates | [] | Enter a list of names of x columns that change over time, if available. | X | X |
Readiness | Custom | static_covariates | [] | Enter a list of names of columns containing unique information for each group, such as franchise names or equipment types, if available. | X | X |
Readiness | Custom | static_cov_unify_method | latest | If static_covariates are not the same within a group, unify them into one value. Choices are “oldest” (earliest value), “latest” (most recent value), “most_common” (most frequent value). | X | X |
Preprocess | Custom | normalizing_method | minmax | Enter the data normalization method. | X | X |
Preprocess | Custom | encoding_method | onehot | Enter the encoding method for categorical variables. | X | X |
Preprocess | Custom | linear_interpolation | False | If True, linear interpolation will fill in any missing values within the time series data for each group. Recommended only if missing values in the middle of data are a problem. | X | X |
Preprocess | Custom | global_padding_interpolation | False | If True, pads the time series data for each group to match the minimum and maximum time indices. Recommended only if the start and end times for each group should be identical. | X | X |
Preprocess | Custom | global_padding_method | zero | Enter the padding method: "zero" for zero padding, "mean" for padding with the group mean value, "same" for padding with the earliest and latest values in the group. | X | X |
Preprocess | Custom | global_time_index_begin | None | Enter the minimum time index, if available. If blank, it defaults to the minimum time index in all groups. Must match the time_format in readiness. | X | X |
Preprocess | Custom | global_time_index_end | None | Enter the maximum time index, if available. If blank, it defaults to the maximum time index in all groups. Must match the time_format in readiness. | X | X |
Preprocess | Custom | outlier_smoothing | False | If True, detects outliers in x covariates for each group using the isolationforest method and replaces them with the previous values. Recommended only if outliers affect prediction. | X | X |
Preprocess | Custom | isolationforest_contamination | 0.001 | The proportion of outliers in the entire time series for the isolationforest model. Typically, values between 0 and 0.3 are used. | X | X |
Preprocess | Custom | expand_features | False | If True, generates features for x covariates using the tsfresh package. Recommended for machine learning models, depending on resource availability. | X | X |
Preprocess | Custom | expand_method | minimal | Enter the feature generation method in the tsfresh package: "minimal" for statistical features only, "comprehensive" for all features. | X | X |
Preprocess | Custom | ensure_stationarity | False | If True, checks the stationarity of x covariates and transforms them by taking the square root and first difference if not stationary. Recommended for machine learning models. | X | X |
Train | Required | forecaster_name | nbeats | Select the model to use for forecasting. Available value: nbeats. | O | X |
Train | Custom | do_validation | True | Whether to divide evaluation data for performance evaluation. Select False if there are too many group keys. | X | X |
Train | Custom | cv_numbers | 1 | The number of divisions for cross-validation. Recommended to set to 1 for experiments. | X | X |
Train | Custom | full_train | True | If do_validation is True, whether to train the final model on the entire data. Set to True to reflect the latest trends in the final model. | X | X |
Train | Custom | optimize_parameters | False | Whether to run hyper-parameter optimization. Recommended to set to False considering running time if there is a lot of data. | X | X |
Train | Custom | use_gpu | False | Whether to use GPU. Recommended to set to True if there is a lot of data and GPU is available. | X | X |
Train | Custom | memory_check | False | Function to check memory usage during training and inference. | X | X |
Train | Custom | runtime_check | False | Function to check the execution time during training. When memory check is enabled, it affects runtime, so set memory check to False. | X | X |
Train | Custom | metric_to_compare | mae | Evaluation metric. Available values: mae, mape, smape, mse, rmse, r2_score | X | X |
Train | Custom | model_parameters | {nbeats: {“n_epochs”: 2, “batch_size”: 800,...}} | Parameters related to model training. If not set, the model is trained with default parameters. See the detailed parameter explanation below. | X | X |
Detailed Explanation of User Arguments
Input Asset
file_type
Enter the file extension of the input data.
- Argument type: Custom
- Input type: string
- Available values:
- csv (default)
- Usage:
- file_type : csv
- ui_args: X
encoding
Enter the encoding type of the input data.
- Argument type: Custom
- Input type: string
- Available values:
- utf-8 (default)
- Usage:
- encoding : utf-8
- ui_args: X
Readiness Asset
y_column
Enter the name of the target column to predict.
- Argument type: Required
- Input type: string
- Available values:
- '' (default)
- Usage:
- y_column : target
- ui_args: O
time_column
Enter the name of the column containing time information.
- Argument type: Required
- Input type: string
- Available values:
- '' (default)
- Usage:
- time_column : time
- ui_args: O
time_format
Enter the format of the time information.
- Argument type: Required
- Input type: string
- Available values:
- “%Y-%m-%d” (default)
- Usage:
- time_format : “%Y-%m-%d”
- ui_args: O
sample_frequency
Enter the frequency of the time information.
- Argument type: Required
- Input type: string
- Available values:
- daily (default)
- yearly
- monthly
- weekly
- daily
- hourly
- minutely
- secondly
- Usage:
- sample_frequency : daily
- ui_args: O
input_chunk_length
Enter the length of the input time series for the model. Please enter the value based on the unit set in sample_frequency.
- Argument type: Required
- Input type: integer
- Available values:
- 6 (default)
- Usage:
- input_chunk_length : 6
- ui_args: O
forecast_periods
Enter the length of the time series to predict for the model. Please enter the value based on the unit set in sample_frequency.
- Argument type: Required
- Input type: integer
- Available values:
- 3 (default)
- Usage:
- forecast_periods : 3
- ui_args: O
groupkey_column
Enter the name of the column containing group key information, if available.
- Argument type: Custom
- Input type: string
- Available values:
- None (default)
- Usage:
- groupkey_column : region
- ui_args: X
x_covariates
Enter a list of names of x columns that change over time, if available.
- Argument type: Custom
- Input type: list
- Available values:
- [] (default)
- Usage:
- x_covariates : []
- ui_args: X
static_covariates
Enter a list of names of columns containing unique information for each group, such as franchise names or equipment types, if available.
- Argument type: Custom
- Input type: list
- Available values:
- [] (default)
- Usage:
- static_covariates : []
- ui_args: X
static_cov_unify_method
If static_covariates are not the same within a group, unify them into one value. Choices are “oldest” (earliest value), “latest” (most recent value), “most_common” (most frequent value).
- Argument type: Custom
- Input type: string
- Available values:
- latest (default)
- Usage:
- static_cov_unify_method : latest
- ui_args: X
Preprocess Asset
normalizing_method
Enter the data normalization method.
- Argument type: Custom
- Input type: string
- Available values:
- minmax (default)
- z-norm
- Usage:
- normalizing_method : minmax
- ui_args: X
encoding_method
Enter the encoding method for categorical variables.
- Argument type: Custom
- Input type: string
- Available values:
- onehot (default)
- label
- Usage:
- encoding_method : onehot
- ui_args: X
linear_interpolation
If True, linear interpolation will fill in any missing values within the time series data for each group. Recommended only if missing values in the middle of data are a problem.
- Argument type: Custom
- Input type: string
- Available values:
- False (default)
- True
- Usage:
- linear_interpolation : False
- ui_args: X
global_padding_interpolation
If True, pads the time series data for each group to match the minimum and maximum time indices. Recommended only if the start and end times for each group should be identical.
- Argument type: Custom
- Input type: string
- Available values:
- False (default)
- True
- Usage:
- global_padding_interpolation : False
- ui_args: X
global
_padding_method Enter the padding method: "zero" for zero padding, "mean" for padding with the group mean value, "same" for padding with the earliest and latest values in the group.
- Argument type: Custom
- Input type: string
- Available values:
- zero (default)
- mean
- same
- Usage:
- global_padding_method : zero
- ui_args: X
global_time_index_begin
Enter the minimum time index, if available. If blank, it defaults to the minimum time index in all groups. Must match the time_format in readiness.
- Argument type: Custom
- Input type: string
- Available values:
- None (default)
- Usage:
- global_time_index_begin : None
- ui_args: X
global_time_index_end
Enter the maximum time index, if available. If blank, it defaults to the maximum time index in all groups. Must match the time_format in readiness.
- Argument type: Custom
- Input type: string
- Available values:
- None (default)
- Usage:
- global_time_index_end : None
- ui_args: X
outlier_smoothing
If True, detects outliers in x covariates for each group using the isolationforest method and replaces them with the previous values. Recommended only if outliers affect prediction.
- Argument type: Custom
- Input type: string
- Available values:
- False (default)
- Usage:
- outlier_smoothing : False
- ui_args: X
isolationforest_contamination
The proportion of outliers in the entire time series for the isolationforest model. Typically, values between 0 and 0.3 are used.
- Argument type: Custom
- Input type: float
- Available values:
- 0.001 (default)
- Usage:
- isolationforest_contamination : 0.001
- ui_args: X
expand_features
If True, generates features for x covariates using the tsfresh package. Recommended for machine learning models, depending on resource availability.
- Argument type: Custom
- Input type: string
- Available values:
- False (default)
- Usage:
- expand_features : False
- ui_args: X
expand_method
Enter the feature generation method in the tsfresh package: "minimal" for statistical features only, "comprehensive" for all features.
- Argument type: Custom
- Input type: string
- Available values:
- minimal (default)
- comprehensive
- Usage:
- expand_method : minimal
- ui_args: X
ensure_stationarity
If True, checks the stationarity of x covariates and transforms them by taking the square root and first difference if not stationary. Recommended for machine learning models.
- Argument type: Custom
- Input type: string
- Available values:
- False (default)
- Usage:
- ensure_stationarity : False
- ui_args: X
Train Asset
forecaster_name
Select the model to use for forecasting. More models supported by Darts will be added in the future.
- Argument type: Required
- Input type: string
- Available values:
- nbeats (default)
- Usage:
- forecaster_name : nbeats
- ui_args: X
do_validation
Whether to divide evaluation data for performance evaluation. Select False if there are too many group keys.
- Argument type: Custom
- Input type: string
- Available values:
- True (default)
- Usage:
- do_validation : True
- ui_args: X
cv_numbers
The number of divisions for cross-validation. Recommended to set to 1 for experiments.
- Argument type: Custom
- Input type: integer
- Available values:
- 1 (default)
- Usage:
- cv_numbers : 1
- ui_args: X
full_train
If do_validation is True, whether to train the final model on the entire data. Set to True to reflect the latest trends in the final model.
- Argument type: Custom
- Input type: string
- Available values:
- True (default)
- Usage:
- full_train : True
- ui_args: X