버전: docs v25.02

Appendix : What is ALO API

Updated 2024.05.05

Asset 개발자는 ALO가 제공하는 API를 통해 data와 config를 다음 Asset으로 전달하여 ML Pipeline을 구축하거나, 로그를 남기거나, Mellerikat과 호환되도록 모델 등의 artifacts를 어디에 저장할 지 등에 대한 정보를 제공 받을 수 있습니다. 이 중에는 필수적으로 호출해야하는 API들도 존재하므로 하나씩 살펴봅니다.

Asset

Asset

alolib 모듈의 Asset 클래스는 Pipeline에서 step 사이에 데이터를 전달하는 API를 제공하며, Train Pipeline에서 Inference Pipeline로 모델을 유기적으로 전달하는 API를 제공합니다.

Example

from alolib.asset import Asset
import pandas as pd

class UserAsset(Asset):
    def __init__(self, asset_structure):
        super().__init__(asset_structure)
        self.args = self.asset.load_args()
        self.config = self.asset.load_config()

    @Asset.decorator_run
    def run(self):
        df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
        output = {'dataframe':df}
        self.asset.save_data(output) # Mandatory
        asset.save_config(self.config) # Mandatory

Pipeline API

Pipeline에서 이전 step의 데이터를 불러오거나, 다음 step으로 데이터를 전달하기 위해 Pipeline API를 제공합니다.

save_config(config: dict)

현재 step 에서 변경된 dict 형태의 config 를 save 합니다. save한 dict는 다음 step에서 load_config() 할 때 load 합니다. Note: Asset의 run() 마지막에 반드시 호출해야 하며, 이전 step에서 추가한 Key의 Value는 변경 가능하지만 Key를 삭제할 수 없습니다.

Parameter
- config : Pipeline의 현재 step에서 생성하거나 변경된 dict 형태의 Configuration 정보 입니다.

Example

y_column = self.asset.check_args(arg_key="y_column", is_required=False, default="", chng_type="str")
self.config["y_column"] = y_column
self.asset.save_config(self.config)

save_data(data: dict)

현재 step 에서 처리한 data를 다음 step으로 보내기 위해 data를 저장합니다. Note: Asset의 run() 마지막에 반드시 호출해야 하며, 이전 step에서 추가한 Key의 Value는 변경 가능하지만 Key를 삭제할 수 없습니다.

Parameter
- data : Pipeline의 현재 step에서 변경된 dict 형태의 data 정보

Example

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
output = {'dataframe':df}
self.asset.save_data(output) # Mandatory

load_config()

현재 step 전까지 저장된 Configuration 정보를 dict 형태로 load 합니다. Note: Asset의 init() 에서 기본적으로 호출합니다.

Return
- dict 형태의 config

Example

# Post Asset
self.config['x_columns'] = ['x1', 'x2', 'x3']
self.config['y_column'] = 'yy'

# Current Asset
self.config = self.asset.load_config()
for key, value in self.config.items():
    print(key, ':', value)

Output
```
x_columns:['x1', 'x2', 'x3']
y_column:yy
```

Note: config의 'meta' key 참고 참고로 pipeline의 제일 처음 오는 asset에서 load_config를 통해 반환 된 dict는 비어있지 않고, 아래와 같이 'meta'라는 key 아래에 몇 가지 meta 정보들이 들어있습니다. asset 개발자는 이 정보를 참고하여 활용 가능합니다.

'meta': {
'artifacts': {
	'input': << input path >>,
	'train_artifacts': << train_artifacts path>>,
	'inference_artifacts': << inference_artifacts path>>,
	'.asset_interface': << .asset_interface path >>,
	'history': << history path >>},
'pipeline': 'train_pipeline',
'step_number': 0,
'step_name': << asset name >>
}
```    

load_data()

이전 step에서 저장된 data 정보를 dict 형태로 load 합니다.
Note: Asset의 init() 에서 기본적으로 호출합니다. 참고로, ML Pipeline의 제일 첫 순서에 위치한 Asset에서는 이전 Asset에서 넘겨 받은 data가 없으므로 load_data()를 호출하지 않습니다.

Return
- dict 형태의 데이터

Example

# Previous Asset
df = pd.DataFrame(np.array([ [1, 2, 3], [4, 5, 6] ]))
output = {'dataframe':df}
self.asset.save_data(output)
-------------------------------------------------------  
# Current Asset
self.output = self.asset.load_data()
print(output['dataframe'])

Output
```
0  1  2  3
1  4  5  6
```

load_args()

experimental_plan.yaml 에서 user_parameter 부분에 현재 Asset의 args로 작성한 parameter들을 dict 형태로 전달 받습니다. Note: Asset의 init() 에서 기본적으로 호출합니다.

Return
- dict 형태의 arguments

Example

# experimental_plan.yaml
        - step: output
        args:
            - args_test : sample_args
-------------------------------------------------------
# Asset
self.args = self.asset.load_args()
print(self.args)

Output
```
{'args_test': 'sample_args'}
```

check_args(arg_key: str, is_required: bool, default: str, chng_type: str)

experimental_plan.yaml 에 작성된 args 에 값이 missing 된 경우, 강제로 입력하여 error 나지 않도록 합니다. experimental_plan.yaml에 args 가 너무 많아지는 현상이 발생하면 사용자가 설정이 힘들어질 수 있으므로, 이럴 때는 is_required=False 로 설정하고 default="value" 로 강제 실행되도록 하여 experimental_plan.yaml 을 단순화 시킬 수 있습니다.

Parameter
- arg_key (str) : experimental_plan.yaml의 args에 작성한 파라미미터 이름
- is_required (bool) : 해당 파라미터를 필수로 입력해줘야 하는 지 여부
- default (str) : 사용자 파라미터가 존재하지 않을 경우, 강제로 입력될 값
- chng_type (str): 타입 변경 list, str, int, float, bool
Return
- arg_value (str): default 로 변경된 값

Example

 x_columns  = self.asset.check_args(arg_key="x_columns", is_required=True, chng_type="list")

Path API

Mellerikat 시스템에서 제공하거나 사용하는 path와 호환되는 경로에 데이터를 save 하거나 load 하기 위해 Path API를 사용합니다.

get_input_path()

pipeline의 입력 데이터가 저장된 경로를 제공합니다.

Return
- Train pipeline : {project_home}/input/train
- Inference pipeline : {project_home}/input/inference

get_model_path(use_inference_path=False)

train pipline 실행 시 학습된 model을 저장하기 위한 경로를 전달 받습니다. 혹은 train pipeline 에서 생성된 model을 inference pipeline 에서 load하기 위한 목적으로 model 저장 경로를 전달 받을 수 있습니다.
기본적으로 train과 inference pipeline 사이에 공통적으로 사용하는 Asset이 있다면, inference에서는 해당 공통 Asset 명의 폴더 이름으로 끝나는 train_artifacts의 절대 경로로 반환 받게 됩니다. 가령, inference pipeline의 preprocess Asset에서 get_model_path() API를 호출하면, {project_home}/train_artifacts/models/preprocess/ 경로를 반환 받게 됩니다.
Note: inference라는 이름의 Asset에서는 예약어처럼 train 모델 경로를 pair로 반환합니다. 가령 inference라는 step에서 get_model_path() 호출 시, train_artifacts/models/train/ 경로를 반환합니다.

Parameter
- use_inference_path : 모델의 경로를 inference_artifacts/model/ 형태의 절대 경로로 제공
Return
- Train pipeline : .train_artifacts/models/{step name}/
- Inference pipeline : .train_artifacts/models/{step name (inference step에서는 train)}/
- use_inference_path=True 일 경우: .inference_artifacts/models/{step name inference step에서 inference}/

get_report_path()

train pipeline 에서 생성한 report를 저장할 경로를 반환 받습니다. inference pipeline에서는 사용불가합니다.

Return
- {project_home}/train_artifacts/report/

get_output_path()

inference 데이터를 재학습으로 활용하기 위해 사용되며, 해당 API 로 전달 받은 경로에 output.jpg 또는 output.csv 를 저장 합니다. 특수 목적으로 output.jpg 와 output.csv 를 모두 저장할 수 있습니다. 저장된 파일을 Edge Conductor 로 전달되어 UI 를 통해 re-labeling 됩니다.

Return
- Train pipeline : {project_home}/train_artifacts/output/
- Inference pipeline : {project_home}/inference_artifacts/output/

get_extra_output_path()

inference pipeline의 output 이나, 혹은 output을 처리하여 외부의 Dashboard 또는 Database로 적재할 데이터를 저장할 경로를 제공합니다.

Return
- Train pipeline : {project_home}/train_artifacts/extra_output/{step name}
- Inference pipeline : {project_home}/inference_artifacts/extra_output/{step name}

Inference Summary API

inference pipeline에서 Inference summary 정보를 생성 및 수정하기 위해 사용합니다.

save_summary()

inference 의 result를 summary 하여 dict 형태로 저장하면 Edge Conductor에서 UI로 제공합니다. inference summary는 result, score 정보를 필수적으로 요구합니다.

Parameter
- result : Inference 수행의 결과를 나타냅니다. 최대 32자 까지 가능합니다.
- score : Retrain의 조건으로 활용할 수 있으며, 가령 probability 정보를 나타냅니다. 0이상 1이하의 값을 가져야하며 소수 2자리까지 표시됩니다.
- probability : 하나의 파일에 대해 추론하는 경우 모든 라벨에 대한 확률 값을 제공합니다.
- note : AI Solution에서 Inference에 대해 참고할 사항 입니다. 최대 128자까지 가능합니다.
Example

# {solution_name}/assets/output/asset_output.py
summary = {}
summary['result'] = # model.predict() # 'OK'  											## Mandatory
summary['score'] = # model.predict_proba() # 0.98 										## Mandatory
summary['note'] = # "The score represents the probability value of the model's prediction result." 			## Mandatory
summary['probability'] = # model.predict_proba() # {'OK': 0.65, 'NG1':0.25, 'NG2':0.1} 	## Optional
self.asset.save_summary(result=summary['result'], score=summary['score'], note=summary['note'], probability=summary['probability'])

load_summary()

inference pipeline의 이전 Asset 중 하나에서 생성된 inference summary 정보를 dict 형태로 load 합니다. summary 정보를 수정하여 데이터에 맞게 변경할 때 사용합니다.

Return
- result, score, note, probability의 key를 가진 dict를 리턴합니다.

Log API

Asset 개발 시 로그를 남기고 싶을 때 터미널에 출력 되고, pipeline.log 파일에 저장할 로그 메시지를 Log API를 통해 남깁니다.

API를 사용하면 다음의 형식으로 로그를 출력하고 파일에도 저장 해줍니다.

[time | USER | log level | file(line) | function]

save_info(msg: str)

message를 로그 포맷에 맞추어 info type으로 터미널에 출력하고 pipeline.log 파일에 저장합니다.

Parameter
- msg : string 형태의 message

Example

self.asset.save_info('Sampe info message')

Output

# Console & pipeline.log
[2024-01-30 09:45:23,611|USER|INFO|asset_input.py(132)|get_data()]: Sample info message

save_warning(msg: str)

message를 로그 포맷에 맞추어 warning type으로 터미널에 출력하고 pipeline.log 파일에 저장합니다.

Parameter
- str : string 형태의 message

Example

self.asset.save_warning('Sampe warning message')

Output

# Console & pipeline.log    
[2024-01-30 09:45:23,611|USER|WARNING|asset_input.py(132)|get_data()]: Sample warning message

save_error(msg: str)

message를 로그 포맷에 맞추어 error type으로 터미널에 출력하고 pipeline.log 파일에 저장하며, 프로세스를 강제 종료 합니다.

Parameter
- str : string 형태의 message

Example

self.asset.save_error('Sampe error message')

Output

# Console & pipeline.log    
[2024-04-24 22:59:18,446|USER|ERROR|data_input.py(105)|save_error()]    
============================= ASSET ERROR =============================
TIME(UTC)   : 2024-04-24 22:59:18,446    
PIPELINE    : inference_pipeline
STEP        : input
ERROR(msg)  : Sample error message
=======================================================================

Asset ​

Pipeline API ​

save_config(config: dict) ​

save_data(data: dict) ​

load_config() ​

load_data() ​

load_args() ​

check_args(arg_key: str, is_required: bool, default: str, chng_type: str) ​

Path API ​

get_input_path() ​

get_model_path(use_inference_path=False) ​

get_report_path() ​

get_output_path() ​

get_extra_output_path() ​

Inference Summary API ​

save_summary() ​

load_summary() ​

Log API ​

save_info(msg: str) ​

save_warning(msg: str) ​

save_error(msg: str) ​

Asset

Pipeline API

save_config(config: dict)

save_data(data: dict)

load_config()

load_data()

load_args()

check_args(arg_key: str, is_required: bool, default: str, chng_type: str)

Path API

get_input_path()

get_model_path(use_inference_path=False)

get_report_path()

get_output_path()

get_extra_output_path()

Inference Summary API

save_summary()

load_summary()

Log API

save_info(msg: str)

save_warning(msg: str)

save_error(msg: str)