Skip to main content
Version: Next

Writing the experimental_plan.yaml

Updated 2025.02.20

Once you have a basic understanding of the experimental_plan.yaml, you will need to understand each asset installed in the assets folder when you run the ALO main.py after installing the Titanic example solution. Follow the guide below to create such assets from scratch.



Writing the experimental_plan.yaml

Since the process of creating a training pipeline and an inference pipeline is not significantly different, understanding the content of the experimental_plan.yaml below will enable you to create a pipeline.

Use the experimental_plan.yaml generated by the 'alo example' CLI command in the terminal after ALO installation is complete.

alo example # Generate Titanic example
vi experimental_plan.yaml ## You can use any file editor other than vim.

The experimental_plan.yaml file organizes the settings related to the execution of AI solutions in YAML format. This file starts by setting the name and version of the AI solution and includes execution history management, modeling-related settings, and data pipeline configuration. Below is an explanation of each section.

AI Solution Information (name, version)

The control section defines the settings for managing execution history, including backup policies and resource usage log output options. The backup setting can choose one of three methods.

Execution History Management (Control)

The control section defines the settings for managing execution history, including backup policies and resource usage log output options.

Modeling Settings (Solution)

The solution section defines the environment settings for modeling, including necessary library installations and user-defined function settings.

Library Installation (pip)

Use the requirements field under pip to specify the libraries needed to run the code. For example, you can install the required versions of numpy, pandas, and scikit-learn libraries. If requirements: True is set, it can read and download from requirements.txt.

User-defined Functions (function)

Set user-defined functions under the function section. Each function specifies the module and function path with the def field and sets the arguments for the function with the argument field.

Dataset and Pipeline Settings (Train, Inference)

Finally, the train and inference sections define the dataset paths and the execution order of functions. Depending on the step, define the data path, the result storage path, and the function execution sequence.

The experimental_plan.yaml file is an important configuration file for systematically managing the execution environment, data processing pipeline, and backup policies for AI solutions. It allows for easy changes to various experiment settings and ensures consistent execution environments for AI solutions.

## experimental_plan.yaml template
# AI Solution Information
name: titanic # AI Solution name
version: 1.0.0 # AI Solution version

# Control section allows you to set execution history management. You can set size, count, or day as appropriate for your system.
control:
# Execution History Management
# backup: # Optional) Backup based on disk usage
# type: size # Required) Backup method (size, count, day)
# value: 5MB # Required) Storage size. Default 1GB. e.g., 1000000000, 1GB, 1MB, 1KB
backup: # Optional) Backup based on the number of executions
type: count # Required) Backup method (size, count, day)
value: 5 # Optional) default 1000
# backup: # Optional) Backup based on the number of days since the last execution
# type: day # Required) Backup method (size, count, day)
# value: 7 # Optional) default: 7 days

check_resource: True # Optional) Log CPU, Memory resource usage during execution. Default: False

# Enter modeling-related information under the Solution section.
# Specify the libraries required to run the code under requirements in pip.
solution:
pip: # Optional) Settings for downloading 3rd Party libs for user-defined functions
# requirements: False # Optional) Set True to install requirements.txt in the current path
requirements: # Optional) To install individual libraries
- numpy==1.26.4
- pandas==1.5.3
- scikit-learn
# - scikit-learn --index-url https://my.local.mirror.com/simple
# credential: # Optional) Credential information for accessing files in train/inference input, output (S3)
# profile_name: aws-profile-name

# Define functions in function and fill in the argument values. This process maps to the user-written code.
function: # Required) User-defined functions
preprocess: # Function name -> used as train.pipeline name
def: titanic.preprocess # Required) Function to execute in user module
train:
def: titanic.train # {python file name}.{function name}
argument: # Function args.
x_columns: [ 'Pclass', 'Sex', 'SibSp', 'Parch']
y_column: Survived
inference:
def: titanic.inference # {python file name}.{function name}
argument: # Function args.
x_columns: [ 'Pclass', 'Sex', 'SibSp', 'Parch']

# For dataset_uri, enter the folder path below, and for artifact_uri, enter the location where the result of the step will be stored.
# For Inference, you can optionally write model_uri.
# Set the function execution order through the pipeline.
train:
dataset_uri: [train_dataset/] # Data folder, folder list (file format not allowed)
# dataset_uri: s3://mellerikat-test/tmp/alo/ # Example1) All folders and files under the path of the S3 key (prefix)
artifact_uri: train_artifact/
pipeline: [preprocess, train] # Function execution list
inference:
dataset_uri: inference_dataset/
# model_uri: model_artifacts/n100_depth5.pkl # Load a pre-trained model
artifact_uri: inference_artifact/ # Optional) Define the compression and upload path of files stored under the path pipeline['artifact']['workspace']. Default path: inferenece.tar.gz
pipeline: [preprocess, inference]