GCR Parameter
Overview of experimental_plan.yaml
To apply AI content to your data, you need to input the data information and the desired content functions into the experimental_plan.yaml
file. After installing the AI content in the solution folder, you can find a pre-written experimental_plan.yaml
file for each content under the solution folder. By entering 'data information' and modifying/adding 'user arguments' provided by each asset in this YAML file, you can execute ALO to generate a data analysis model with the desired settings.
Structure of experimental_plan.yaml
The experimental_plan.yaml
includes various settings necessary to run ALO. By modifying the 'data path' and 'user arguments' among these settings, you can use the AI content immediately.
Inputting Data Paths (external_path
)
- The
external_path
parameter is used to specify the path of files to be loaded or the path where files will be saved. Ifsave_train_artifacts_path
andsave_inference_artifacts_path
are not specified, the modeling artifacts will be saved in the default pathstrain_artifacts
andinference_artifacts
folders, respectively.
external_path:
- load_train_data_path: ./solution/sample_data/train
- load_inference_data_path: ./solution/sample_data/test
- save_train_artifacts_path:
- save_inference_artifacts_path:
Parameter Name | DEFAULT | Description and Options |
---|---|---|
load_train_data_path | ./sample_data/train/ | Enter the folder path where the training data is located (do not include the csv file name). All csv files under the specified path are concatenated. |
load_inference_data_path | ./sample_data/test/ | Enter the folder path where the inference data is located (do not include the csv file name). All csv files under the specified path are concatenated. |
*All files under the specified path, including those in subfolders, are concatenated. | ||
*All columns in the files to be concatenated must be identical. |
User Parameters (user_parameters
)
- The
step
underuser_parameters
refers to the asset name. For example,step: input
refers to the input asset stage. args
refers to the user arguments of the input asset (step: input
). User arguments are data analysis-related setting parameters provided by each asset. Refer to the User arguments description below for details.
user_parameters:
- train_pipeline:
- step: input
args:
- file_type
...
ui_args:
...
User arguments explanation
What are User arguments?
User arguments are parameters for setting the operations of each asset, which are entered under args
in the respective asset steps of the experimental_plan.yaml
. Each asset in the AI content pipeline provides user arguments to apply various functions to the data. Refer to the guide below to change or add user arguments to create a model that fits your data.
User arguments are divided into 'required arguments' that are pre-written in the experimental_plan.yaml
and 'Custom arguments' that users can add by referring to the guide.
Required arguments
- Required arguments are the basic arguments that are immediately visible in the
experimental_plan.yaml
. Most required arguments have default values pre-set in the YAML file. - Users must enter values for the data-related arguments among the required arguments in the
experimental_plan.yaml
(e.g., x_columns, y_column).
Custom arguments
- Custom arguments are functions provided by the asset but not listed in the
experimental_plan.yaml
. Users can add these arguments to the YAML file's respective assetargs
.
The GCR pipeline consists of Input - Readiness - Graph - Modeling (train/inference) - Output assets, and the user arguments are configured differently for each asset's function. First, try modeling with the default required arguments settings in the experimental_plan.yaml
, and then add user arguments to create a GCR model that fits your data perfectly!
Summary of User arguments
Below is a summary of the user arguments for GCR. Click on the 'Argument Name' to navigate to its detailed explanation.
Default
- The 'Default' column indicates the default value of the user argument.
- If there is no default value, it is marked with '-'.
- If the default value is to leave it empty, it is marked as ' '.
- If there is logic behind the default value, it is marked as 'Refer to the description'. Click on the 'Argument Name' to see the detailed explanation.
ui_args
- The 'ui_args' column indicates whether the
ui_args
function is supported, allowing the argument value to be changed in the AI Conductor UI. - O: If you enter the argument name under
ui_args
in theexperimental_plan.yaml
, you can change the argument value in the AI Conductor UI. - X: The
ui_args
function is not supported. - For detailed explanation about
ui_args
, please refer to the following guide. Write UI Parameter
User Configuration Required
- The 'User Configuration Required' column indicates whether the user must check and change the argument before running the AI content.
- O: Generally, task and data-related information that users need to input before modeling.
- X: If the user does not change the value, the default value is used for modeling.
Asset Name | Argument Type | Argument Name | Default | Description | User Configuration Required | ui_args |
---|---|---|---|---|---|---|
Input | Required | file_type | csv | Input data file extension. | X | O |
Input | Required | encoding | utf-8 | Input data encoding type. | X | O |
Readiness | Required | x_columns | ' ' | List of x column names to be used for training. If left blank, all columns except y_column are used. | X | O |
Readiness | Required | drop_columns | ' ' | List of column names to exclude from x columns. | X | O |
Readiness | Required | y_column | - | Name of the y column. | O | O |
Graph | Required | dimension | 32 | Number of dimensions for graph embeddings. | X | O |
Graph | Required | num_epochs | 10 | Number of training epochs for graph embeddings algorithm. | X | O |
Graph | Required | num_partitions | 1 | Number of partitions to divide the input data for embedding. | X | O |
Graph | Required | use_gpu | False | Whether to use GPU for graph embedding in a GPU-available environment. | X | X |
Graph | Custom | workers | 1 | Number of processes for parallel execution during graph embedding. | X | X |
Graph | Custom | custom_connection_lhs | ' ' | Left-hand columns to be connected based on domain knowledge. | X | X |
Graph | Custom | custom_connection_rhs | ' ' | Right-hand columns to be connected based on domain knowledge. | X | X |
Graph | Custom | comparator | dot | Function to compare the similarity of two embeddings during graph embedding. | X | X |
Graph | Custom | loss_fn | softmax | Loss function for training during graph embedding. | X | X |
Graph | Custom | lr | 0.01 | Learning rate for training during graph embedding. | X | X |
Graph | Custom | batch_size | 1000 | Batch size for training during graph embedding. | X | X |
Train | Required | task | classification | Type of prediction task. | X | O |
Train | Required | eval_metric | f1_score | Evaluation metric for selecting the best model during HPO. | X | O |
Train | Required | num_hpo | 20 | Number of HPO trials. | X | O |
Inference | Required | global_xai | False | Whether to perform global XAI during inference. | X | O |
Inference | Required | local_xai | False | Whether to perform local XAI during inference. | X | O |
Detailed Explanation of User arguments
Input asset
file_type
Specify the file extension of the input data. Currently, AI Solution development only supports csv files.
- Argument type: Required
- Input type
- string
- Possible values
- csv (default)
- Usage
- file_type: csv
- ui_args: O
encoding
Specify the encoding type of the input data. Currently, AI Solution development only supports utf-8 encoding.
- Argument type: Required
- Input type
- string
- Possible values
- utf-8 (default)
- Usage
- encoding: utf-8
- ui_args: O
Readiness asset
x_columns
Enter the list of x column names in the dataframe. If left blank
, all columns except y_column are used as x columns.
- Argument type: Required
- Input type
- list
- Possible values
- Empty (default) or list of column names
- Usage
- x_columns: [col1, col2]
- ui_args: O
drop_columns
Enter the list of column names to exclude from x columns in the dataframe. If left blank, it means there are no columns to exclude.
- Argument type: Required
- Input type
- list
- Possible values
- Empty (default) or list of column names
- Usage
- drop_columns: [col1, col2]
- ui_args: O