Skip to main content
Version: Next

AI Contents Modification

Updated 2024.05.05

AI Contents are composed of Assets (or Steps) divided into input, preprocessing, modeling, and output stages of data.
These Assets are connected in a defined order to form an ML Pipeline, through which data flows and Assets are executed sequentially.

The experimental_plan.yaml file describes everything needed to configure the AI Pipeline, including data source path settings, preprocessing parameters, model specifications and hyperparameter adjustments, result artifacts storage paths, and Pipeline operation controls.
When you specify data paths in the experimental_plan.yaml file, ALO copies the data from the specified path to the input/train (or inference) path for use in the experiment process. This facilitates smooth data source changes and experiment repetitions, allowing for easy execution of various experiment scenarios.

If running AI Contents does not meet the desired performance criteria, users can modify specific Assets or add or replace them with newly developed Assets.
For example, if the existing preprocessing Asset is not suitable for the data, you can develop a new preprocessing logic, create a new Asset, and replace it to maximize model performance.
Additionally, to improve the execution speed of the ML Pipeline, you can identify and remove inefficient Assets to make the Pipeline lighter and faster.

ALO is designed to allow easy insertion and deletion of each Asset to adapt to the environment. This allows data scientists and developers to gain in-depth insights into tasks and continuously improve AI model performance through rapid experimentation. The flexible structure of ALO enables application in various tasks and environments, promoting the rapid development and integration of suitable AI Solutions.

Topics



Modifying Existing Assets

To modify some of the Assets (= steps) used by a specific AI Content or AI Solution, first run ALO's main.py once to download the code for each Asset into the assets folder. After downloading the code for each Asset, change the code key area of the relevant Asset step in the asset_source section of experimental_plan.yaml from a git address to local to prevent re-downloading the Asset code from git during Asset modification.

Note: The local option can also be used when initially developing a new Asset that does not already exist in git.
Note: If you are not developing a new Asset but modifying an existing Asset in git, set the get_asset_source part of the control section of experimental_plan.yaml to once so that the Assets are not re-downloaded from git after the first execution of main.py.

#experimental_plan.yaml

asset_source:
...
- inference_pipeline:
...
- step: output
source:
code: local # set to local instead of a git address
branch: custom_output
requirements:
- requirements.txt

Then, modify the code in the files under the folder of the Asset step you want to modify in the alo/assets folder and run main.py to check the operation.



Adding New Assets

Note: Steps are the same as Assets but are also used to refer to the execution order of a specific Asset in the AI pipeline.

Create a folder named after the new Asset step in the assets folder and create an asset_[step_name].py file.

./{solution_name}/assets/[step_name]
└ asset_[step_name].py
└ requirements.txt

Create asset_[step_name].py based on the following Asset Template and develop it according to the purpose.

#asset_[step_name].py

# -*- coding: utf-8 -*-
import os
import sys
from alolib.asset import Asset

sys.path.append(os.path.dirname(os.path.abspath(__file__)))


#--------------------------------------------------------------------------------------------------------------------------
# CLASS
#--------------------------------------------------------------------------------------------------------------------------
class UserAsset(Asset):
def __init__(self, asset_structure):
super().__init__(asset_structure)
## Load user_parameters of this Asset written in experimental_plan.yaml as a dict
self.args = self.asset.load_args()
## Load config information passed from the previous Asset in this Asset as a dict
self.config = self.asset.load_config()

## Save the information to be passed to the next Asset as a dict
## - Information can be passed to the next Asset in the form of self.config['new_key'] = 'new_value'
## Load data passed from the previous Asset
self.input_data = self.asset.load_data()
## (Note: If the key in the config and data passed from the previous Asset is deleted, ALO error will occur.)

@Asset.decorator_run
def run(self):
## Pass data and config to the next Asset
## The following example passes the data and config received from the previous Asset as they are to the next Asset
output_data = self.input_data
output_config = self.config

self.asset.save_data(output_data)
self.asset.save_config(output_config)


#--------------------------------------------------------------------------------------------------------------------------
# MAIN
#--------------------------------------------------------------------------------------------------------------------------
if __name__ == "__main__":
ua = UserAsset(envs={}, argv={}, data={}, config={})
ua.run()

Add the newly developed Asset to the user_parameters and asset_source sections of experimental_plan.yaml to change the configuration of the ML pipeline.

#experimental_plan.yaml

user_parameters:
...
- train_pipeline:
...
- step: {step_name}
args:
- handling_missing: dropna

asset_source:
...
- train_pipeline:
...
- step: {step_name}
source:
code: local
branch:
requirements:
- requirements.txt


Deleting Existing Assets

If the execution of a specific Asset in the ML pipeline is unnecessary, you can comment it out or delete it from experimental_plan.yaml.  

Note: Both the user_parameters and asset_source sections must delete or comment out the relevant Asset to prevent errors.  

#experimental_plan.yaml

user_parameters:
...
- train_pipeline:
...
# - step: preprocess
# args:
# - mode: auto # auto, custom
# custom: {}

asset_source:
...
- train_pipeline:
...
# - step: preprocess
# source: ## supports git / local
# code: {preprocess Asset git address}
# branch: prep_v1.0.0
# requirements:
# - requirements.txt


Controlling ML Pipeline Operations

#experimental.yaml  

control:
## 1. Decide whether to check for package installation and asset presence at each experiment, or just once / Decide whether to install requirements.txt and dependencies only once or at every run
- get_asset_source: once ## once, every

## 2. Decide whether to back up the generated artifacts True / False
- backup_artifacts: True

## 3. Decide whether to back up the pipeline logs True / False
- backup_log: True

## 4. Determine the storage size (unit MB)
- backup_size: 1000

## 5. Supports memory and file for data transfer between Assets
- interface_mode: file

## 6. Decide whether to check resources such as memory and CPU during execution True / False
- check_resource: False

## 7. Decide whether to use zip or tar.gz as the compression format when delivering inference artifacts to an external path tar.gz / zip
- save_inference_format: tar.gz

-get_asset_source: # once / every

When you install an existing AI Content or AI Solution in the solution folder and run python main.py for the first time, ALO creates an assets folder and various Asset folders under it, and then downloads the source code of each Asset from their respective git repositories. If you re-download the Assets every time you run python main.py, any changes made by the user to some Asset codes will be lost. Therefore, if you set get_asset_source in the control section of experimental_plan.yaml to once, ALO will download the Asset codes only once during the first execution, and will skip downloading if the Asset folder (e.g., input) already exists under the assets folder in subsequent executions. If you set it to every, the Asset codes are downloaded at each run.

- backup_artifacts: # True / False

After running python main.py, folders like train_artifacts or inference_artifacts are generated. If backup_artifacts is set to True, ALO copies the artifact folders to the history folder for backup.

- backup_log (TBD): # True / False

Under train_artifacts or inference_artifacts, there is a log folder containing pipeline.log and process.log files. Logs saved using ALO's logging API by the Asset developer are stored in pipeline.log. If backup_log is set to True

, these files are saved properly.

- backup_size: # 1000 (int)

Depending on the computing resources of the user's experiment environment, unlimited history backup can cause problems due to insufficient capacity. Therefore, if you set backup_size to 1000, history backup is performed only up to 1000MB.

- interface_mode: # memory / file

Asset developers use APIs like asset.save_data() and asset.save_config() to pass data and config information to the next Asset. If interface_mode is set to memory, the data and config are passed between Assets in memory without being saved as separate files. If interface_mode is set to file, the data and config are saved as files in the interface folder, and the next Asset reads and loads them from these files.

- check_resource: # True / False

Decides whether to log the memory and CPU resource status during the execution of each Asset. Setting it to True may slow down the program due to resource access time.

- save_inference_format: # zip / tar.gz

Determines the compression file format (zip or tar.gz) for inference artifacts stored in an external path.



Generating Inference Summary Information

Developing an AI Solution involves modifying AI Contents to suit the operational task. While maintaining the algorithm, it is possible to change and add Data In & Out formats and optimize parameters. One of the key aspects of AI Solution development is modifying the inference results to match the task requirements. To facilitate this, ALO provides the self.asset.save_summary() API for evaluating and summarizing inference results. The summary generated by this API can be reviewed in the user interface (UI) of Edge Conductor, allowing evaluation of the model's inference performance. If the format or content of the existing inference results included in AI Contents does not meet project requirements, users can load the existing inference results using the asset.load_summary() function, make necessary adjustments, and save them again. This allows customizing the inference results to provide more in-depth or detailed information or adjust the format to better fit specific operational environments. For more detailed information about the ALO API, refer to the Appendix: What is ALO API page.

Inference results of various AI models can be diverse and numerous. However, customers using the AI Solution service will want a single, intuitive inference Score value. Therefore, data scientists need to create inference summary information containing inference results and inference scores. ALO provides the self.asset.save_summary() API to achieve this purpose.

In the future, Edge Conductor will receive a file containing summary information about the Inference performed in ALO within the Edge App. When you click on the summary in the Edge Conductor UI, the information will be displayed.

- result:

Represents the result of the inference execution. It can be up to 32 lowercase characters.

- score:

Can be used as a criterion for determining retraining and is displayed up to two decimal places. It is written only in Python's default float type. For example, in a classification problem, the probability value of the predicted result is used.

- note:

Provides reference information about the Inference in the AI Solution. It can be up to 128 lowercase characters. For example, an explanation of what the score means, which will be shown in the Edge Conductor UI.

- probability:

An optional key that requires the probability values for all labels (e.g., 'OK', 'NG1', 'NG2') if it is an AI Solution for classification and there is a single Inference Data.

# {solution_name}/assets/output/asset_output.py
summary = {}
summary['result'] = # model.predict() # 'OK' ## Mandatory
summary['score'] = # model.predict_proba() # 0.98 ## Mandatory
summary['note'] = # The score represents the probability value of the model's inference prediction. ## Mandatory
summary['probability'] = # model.predict_proba() # {'OK': 0.65, 'NG1':0.25, 'NG2':0.1} ## Optional
self.asset.save_summary(result=summary['result'], score=summary['score'], note=summary['note'], probability=summary['probability'])


Modifying Inference Output

The output files of inference results can come in various forms and may be multiple. However, to support the core functionality of Mellerikat's retraining feature, there are constraints on the output file format.
Note: Inference result files should be saved as a single table (.csv) file or a single image file (.jpg, .png, .svg) using ALO's asset.get_output_path() API. Only one file of each type can be saved.
If the data to be included in the output file is in dataframe format and you want to use the output file as input data for retraining requests later, the output file should concatenate both the inference results and the original input data. Edge Conductor provides a detailed view of the inference output data through a simple UI click. To obtain the path for saving the output in the Asset code, use the following API.

self.asset.get_output_path()

To save data for external dashboards or databases, save the output file to the path obtained using the self.asset.get_extra_output_path() API.

self.asset.get_extra_output_path()


Registering an AI Solution

After completing AI Solution development, proceed with Registering an AI Solution.