Appendix : What is Experimental Plan
AI Contents operate based on the pipeline information described in experimental_plan.yaml. An ML pipeline requires data, asset code, and Python packages to run, which must be described in the components of the experimental_plan.yaml file.
Topics
Components of experimental_plan.yaml
name : Write the name of the AI Contents. It will be recorded and managed in Mellerikat.
external_path : Specify the data to be used for training and inference.
-
load_train/inference_data_path: Specify the location of the input data.
- Supports relative paths. main.py is recognized as the current location. (e.g., "../sample_data/train/")
- Supports absolute paths. (e.g., "/home/user.name/alo/train/")
- Supports S3 paths. (e.g., "s3://ecr-repo-an2-cism-dev/ai-solutions/public/bolt_fastening/train/")
-
save_train/inference_artifacts_path: Specify the location to save the pipeline artifacts.
- Supports relative paths, absolute paths, and S3 paths like load.
- The artifacts are saved in compressed form as train_artifacts.tar.gz, inference_artifacts.tar.gz, and model.tar.gz.
-
load_model_path: If the same contents were worked on in a different location, you can load only the model to run the inference pipeline.
Note:
- You can write multiple data folders for load_train/inference_data_path when experimenting. In this case, write them in list form as shown below. The contents of the last folder will be copied as a whole, including all subfolders and files.
load_train_data_path: [path1/folder1/, path2/folder2/]
- save_train/inference_artifacts_path can only be a single path, which will compress and pass the artifacts folder and models folder as a whole (for the training pipeline only).
external_path_permission : Record the AWS S3 access key.
- aws_key_profile: Enter the name of the aws configure profile.
- First, set aws configure --profile {profile_name} in the work environment. (Refer to AWS documentation: https://docs.aws.amazon.com/ko_kr/cli/latest/userguide/cli-configure-files.html)
user_parameters : Specify the parameter values to be used inside the asset.
- {type}_pipeline: The pipeline type only supports train and inference.
- "step: step_name": Write the name of the Asset (=step).
- "args: {key: value, key2: value2}": Write the parameters to be used in the step.
- The value of the parameter can use all formats (int, float, bool, list, dict) supported in yaml.
asset_source : Write the git address of the asset code.
-
{type}_pipeline: The pipeline type only supports train and inference.
- "step: step_name": Write the name of the step.
- "source: \code: git uri 혹은 local}": Write the git address. If developing the asset directly from local, write it as local.
- "source: {branch: name}": Write the branch name.
- "source: {requirements: [package1, package2 ..]}": Write the names of the Python packages to install. (e.g., "pandas==1.5.3")
Note: If a requirements.txt file already exists in the Asset git repository with a list of packages, write it as "requirements.txt".
Note: The installation of dependent packages for the Asset is done in the same order as the connected Assets in the pipeline. If pandas==1.5.3 is required in the input step but pandas==1.5.4 is required in the train step, it will skip the installation of version 1.5.4 since version 1.5.3 is already installed. However, to avoid package conflicts, you may need to reinstall version 1.5.4. In such cases, specify it as pandas==1.5.4 --force-reinstall.
Note: If there are no dependent packages required for a specific Asset, leave it as an empty list as shown below.
requirements: []
```
control : Select the ALO settings according to the experimental environment.
- get_asset_source: Decide whether to git clone every time the pipeline runs. Supports once, every.
- Decide whether to git clone the Asset code every time.
- Note that the installation of Python packages used by the Asset is always checked, and any additional packages will be installed before running the pipeline.
- backup_artifacts: Decide whether to back up the pipeline artifacts in the history. Supports True, False.
- backup_log: Decide whether to back up the pipeline log in the history. Supports True, False. (TBD)
- backup_size: Decide the storage size of the history in MB. If the size is exceeded, it will be deleted in order of oldest first.
- interface_mode: Decide how to pass parameters and data between assets. Supports memory and file.
- memory: Shorten the save/load time of files between assets to speed up execution.
- file: Save the results of assets as files to enable comparison of experiments.
- check_resource: If False, execute the default behavior of the pipeline. If set to True, log memory and CPU resource usage at the end of each Asset.
- save_inference_format: Compression format for saving inference artifacts to the path specified in save_inference_artifacts_path in external_path. Supports zip or tar.gz.
Titanic Example experimental_plan.yaml
Below is the experimental_plan.yaml of the Titanic example. To keep it simple, the Titanic example does not include UI args-related information. For details on UI args, refer to Write UI Parameter.
name: "demo-titanic"
version: "1.0.0"
## Specify the location to load data from / save results to externally
external_path:
- load_train_data_path: ./solution/sample_data/train_data/ # Recognizes main.py as the current location. Enter the sample_data path bundled with the solution.
- load_inference_data_path: ./solution/sample_data/inference_data/
- save_train_artifacts_path:
- save_inference_artifacts_path:
- load_model_path:
external_path_permission:
- aws_key_profile:
## Set the parameters required for the experiment
## - If deleted from here, it runs with the default parameters written in the code
user_parameters:
- train_pipeline:
- step: input
args:
- x_columns: ['Pclass', 'Sex', 'SibSp', 'Parch']
y_column: Survived
- step: train
args:
- n_estimators: 100
- inference_pipeline:
- step: input
args:
- x_columns: ['Pclass', 'Sex', 'SibSp', 'Parch']
y_column:
- step: inference
args:
- step: output
args: ## Record the installation information of the asset
asset_source:
- train_pipeline:
- step: input
source:
code: http://mod.lge.com/hub/dxadvtech/assets/alo-guide-input.git
branch: release_1.0
requirements:
- pandas==1.5.3
- step: train
source:
code: http://mod.lge.com/hub/dxadvtech/assets/alo-guide-train.git
branch: release_1.0
requirements:
- scikit-learn
- inference_pipeline:
- step: input
source:
code: http://mod.lge.com/hub/dxadvtech/assets/alo-guide-input.git
branch: release_1.0
requirements:
- pandas==1.5.3
- step: inference
source:
code: http://mod.lge.com/hub/dxadvtech/assets/alo-guide-inference.git
branch: release_1.0
requirements: []
- step: output
source:
code: http://mod.lge.com/hub/dxadvtech/assets/alo-guide-output.git
branch: release_1.0
requirements: []
control:
## 1. Decide whether to check the existence of assets and install packages every time the experiment runs or only once / whether to install the requirements.txt and dependencies once or every time
- get_asset_source: once ## once, every
## 2
. Decide whether to back up the generated artifacts to history. True / False
- backup_artifacts: True
## 3. Decide whether to back up the pipeline log to history. True / False
- backup_log: True
## 4. Decide the storage size (in MB)
- backup_size: 1000
## 5. Support memory and file as data transfer methods between Assets
- interface_mode: memory
## 6. Compression format for inference artifacts
- save_inference_format: tar.gz ## tar.gz, zip
## 7. Resource check
- check_resource: False ## True: measure memory, CPU / False
ui_args_detail: []