Appendix : What is ALO API
Asset developers can use the APIs provided by ALO to pass data and config to the next Asset, build an ML Pipeline, log messages, and specify where to save artifacts like models to ensure compatibility with Mellerikat. Some of these APIs are mandatory, so let's go through them one by one.
Asset
The Asset class in the alolib module provides APIs to pass data between steps in the Pipeline and to seamlessly transfer models from the Train Pipeline to the Inference Pipeline.
Example
from alolib.asset import Asset
import pandas as pd
class UserAsset(Asset):
def __init__(self, asset_structure):
super().__init__(asset_structure)
self.args = self.asset.load_args()
self.config = self.asset.load_config()
@Asset.decorator_run
def run(self):
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
output = {'dataframe': df}
self.asset.save_data(output) # Mandatory
asset.save_config(self.config) # Mandatory
Pipeline API
Pipeline API provides methods to load data from the previous step or pass data to the next step in the Pipeline.
save_config(config: dict)
Saves the config, which is a dictionary that has been modified in the current step. The saved dictionary is loaded in the next step using load_config(). Note: Must be called at the end of the Asset's run() method. The value of keys added in the previous step can be changed, but keys cannot be deleted.
- Parameter
- config : A dictionary of configuration information generated or modified in the current step of the Pipeline.
- Example
y_column = self.asset.check_args(arg_key="y_column", is_required=False, default="", chng_type="str")
self.config["y_column"] = y_column
self.asset.save_config(self.config)
save_data(data: dict)
Saves the data processed in the current step to pass it to the next step. Note: Must be called at the end of the Asset's run() method. The value of keys added in the previous step can be changed, but keys cannot be deleted.
- Parameter
- data : A dictionary of data modified in the current step of the Pipeline.
- Example
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
output = {'dataframe': df}
self.asset.save_data(output) # Mandatory
load_config()
Loads the configuration information saved up to the current step in the form of a dictionary. Note: Called by default in the Asset's init() method.
- Return
- A dictionary of config
- Example
# Post Asset
self.config['x_columns'] = ['x1', 'x2', 'x3']
self.config['y_column'] = 'yy'
# Current Asset
self.config = self.asset.load_config()
for key, value in self.config.items():
print(key, ':', value) - Output
x_columns: ['x1', 'x2', 'x3']
y_column: yy - Note: Refer to the 'meta' key in the config
The dictionary returned by load_config() in the first asset of the pipeline is not empty. It contains several meta information under the 'meta' key. Asset developers can reference and utilize this information.
'meta': {
'artifacts': {
'input': << input path >>,
'train_artifacts': << train_artifacts path>>,
'inference_artifacts': << inference_artifacts path>>,
'.asset_interface': << .asset_interface path >>,
'history': << history path >>},
'pipeline': 'train_pipeline',
'step_number': 0,
'step_name': << asset name >>
}
load_data()
Loads the data saved by the previous step in the form of a dictionary.
Note: Called by default in the Asset's init() method. Note that the first asset in the ML Pipeline does not call load_data() as there is no data passed from the previous asset.
- Return
- A dictionary of data
- Example
# Previous Asset
df = pd.DataFrame(np.array([ [1, 2, 3], [4, 5, 6] ]))
output = {'dataframe': df}
self.asset.save_data(output)
-------------------------------------------------------
# Current Asset
self.output = self.asset.load_data()
print(output['dataframe']) - Output
0 1 2 3
1 4 5 6
load_args()
Receives the parameters written as args for the current Asset in the user_parameter section of experimental_plan.yaml in the form of a dictionary. Note: Called by default in the Asset's init() method.
- Return
- A dictionary of arguments
- Example
# experimental_plan.yaml
- step: output
args:
- args_test : sample_args
-------------------------------------------------------
# Asset
self.args = self.asset.load_args()
print(self.args) - Output
{'args_test': 'sample_args'}
check_args(arg_key: str, is_required: bool, default: str, chng_type: str)
Forces the entry of missing values in the args written in experimental_plan.yaml to prevent errors. When there are too many args in experimental_plan.yaml, it can be difficult for users to set them all, so set is_required=False and default="value" to force execution and simplify experimental_plan.yaml.
- Parameter
- arg_key (str) : The name of the parameter written in the args section of experimental_plan.yaml
- is_required (bool) : Whether the parameter is required
- default (str) : The value to be forcibly entered if the user parameter does not exist
- chng_type (str): Type conversion list, str, int, float, bool
- Return
- arg_value (str): The value changed to default
- Example
x_columns = self.asset.check_args(arg_key="x_columns", is_required=True, chng_type="list")
Path API
Path API is used to save or load data to paths compatible with the Mellerikat system.
get_input_path()
Provides the path where the input data of the pipeline is stored.
- Return
- Train pipeline : {project_home}/input/train
- Inference pipeline : {project_home}/input/inference
get_model_path(use_inference_path=False)
Provides the path to save the trained model when running the train pipeline.
It can also be used to receive the model save path to load the model generated by the train pipeline in the inference pipeline.
By default, if there is an Asset commonly used between the train and inference pipelines, it is returned as the absolute path of train_artifacts ending with the folder name of the common Asset.
For example, if the preprocess Asset in the inference pipeline calls get_model_path(), it returns the path {project_home}/train_artifacts/models/preprocess/.
Note: An asset named inference returns the train model path as a pair when calling get_model_path(). For example, if the step named inference calls get_model_path(), it returns the path {project_home}/train_artifacts/models/train/.
- Parameter
- use_inference_path : Provides the path in the form of inference_artifacts/model/
- Return
- Train pipeline : .train_artifacts/models/{step name}/
- Inference pipeline : .train_artifacts/models/{step name (inference step에서는 train)}/
- use_inference_path=True 일 경우: .inference_artifacts/models/{step name inference step에서 inference}/
get_report_path()
Returns the path to save reports generated in the train pipeline. Not available in the inference pipeline.
- Return
- {project_home}/train_artifacts/report/
get_output_path()
Used to reuse inference data for retraining. Save output.jpg or output.csv in the path provided by this API. It is also possible to store both output.jpg and output.csv for special purposes. The saved file is transferred to Edge Conductor for re-labeling through the UI.
- Return
- Train pipeline : {project_home}/train_artifacts/output/
- Inference pipeline : {project_home}/inference_artifacts/output/
get_extra_output_path()
Provides the path to save data for external dashboards or databases processed by the inference pipeline output.
- Return
- Train pipeline : {project_home}/train_artifacts/extra_output/{step name}
- Inference pipeline : {project_home}/inference_artifacts/extra_output/{step name}
Inference Summary API
Used to create and modify Inference summary information in the inference pipeline.
save_summary()
Saves the result of inference as a summary in the form of a dictionary, which is provided as a UI in Edge Conductor. Inference summary requires result and score information.
- Parameter
- result : The result of performing inference. Up to 32 characters.
- score : Can be used as a criterion for Retrain. For example, probability information. Must be between 0 and 1 and displayed up to two decimal places.
- probability : Provides probability values for all labels in the case of a single classification Solution.
- note : Information about the inference in the AI Solution. Up to 128 characters.
- Example
# \{solution_name\}/assets/output/asset_output.py
summary = {}
summary['result'] = # model.predict() # 'OK' ## Mandatory
summary['score'] = # model.predict_proba() # 0.98 ## Mandatory
summary['note'] = # "The score represents the probability value of the model's prediction result." ## Mandatory
summary['probability'] = # model.predict_proba() # {'OK': 0.65, 'NG1':0.25, 'NG2':0.1} ## Optional
self.asset.save_summary(result=summary['result'], score=summary['score'], note=summary['note'], probability=summary['probability'])
load_summary()
Loads the inference summary information created by one of the previous assets in the inference pipeline as a dictionary. Used to modify summary information to fit the data.
- Return
- A dictionary with the keys result, score, note, and probability.
Log API
When developing an Asset, use the Log API to leave log messages in the terminal and the pipeline.log file.
Using the API, log messages are output and saved in the following format.
[time | USER | log level | file(line) | function]
save_info(msg: str)
Outputs the message to the terminal in the info type format and saves it to the pipeline.log file.
- Parameter
- msg : String message
- Example
self.asset.save_info('Sample info message')
- Output
# Console & pipeline.log
[2024-01-30 09:45:23,611|USER|INFO|asset_input.py(132)|get_data()]: Sample info message
save_warning(msg: str)
Outputs the message to the terminal in the warning type format and saves it to the pipeline.log file.
- Parameter
- str : String message
- Example
self.asset.save_warning('Sample warning message')
- Output
# Console & pipeline.log
[2024-01-30 09:45:23,611|USER|WARNING|asset_input.py(132)|get_data()]: Sample warning message
save_error(msg: str)
Outputs the message to the terminal in the error type format, saves it to the pipeline.log file, and forcibly terminates the process.
- Parameter
- str : String message
- Example
self.asset.save_error('Sample error message')
- Output
# Console & pipeline.log
[2024-04-24 22:59:18,446|USER|ERROR|data_input.py(105)|save_error()]
============================= ASSET ERROR =============================
TIME(UTC) : 2024-04-24 22:59:18,446
PIPELINE : inference_pipeline
STEP : input
ERROR(msg) : Sample error message
=======================================================================