Version: Next

Building the Pipeline

Updated 2025.02.20

To register the modeling code created by the user as an AI Solution, the created modeling code must be converted into an ALO format. This process requires seven major modifications.

Follow the guide below to convert the modeling code for creating an AI Solution.

Developing the Pipeline

This section explains the process of converting user-written modeling code into an ALO format. Users do not need a deep understanding of ALO v3 and can easily and conveniently convert the code with minimal modifications to the logic code.

1. Add a pipeline argument to the function definition

Contents such as the data storage path are provided through the pipeline, along with logger functionality.

Example)

def preprocess(): → def preprocess(pipeline: dict): # Add pipeline to all functions

The logger functionality provided by ALO can be used as follows.

Example) "train" can be modified to the content the user wants to log.

...
logger = pipeline['logger'] # ALO syntax
logger.debug("train") # "train" can be modified
...

2. Modify the part where specific paths are loaded in the existing code to be received through the pipeline

Example: Existing code → ALO format code

def train(pipeline: dict):
    ...
    pd.read('a.csv') → pd.read(pipeline['dataset']['workspace']"/a.csv")
    ...
 
def inference(pipeline: dict):
    ...
    pd.read('b.csv') → pd.read(pipeline['dataset']['workspace']"/b.csv")
    ...

3. To pass arguments between functions, use the return function

Example) Argument passing between functions is handled by function a's return → pipeline['a']['result'] when called in function b. Return types support dict, str, int, etc.

def preprocess(pipeline: dict):
    logger = pipeline['logger']
    logger.info("preprocess.")
    logger.info(".")
    return {"output": "preprocess is done"}
 
def train(pipeline: dict, x_columns=[], y_column=None, n_estimators=100):
    logger = pipeline['logger']
    logger.debug("train")
    ...
    preprocess_check = pipeline['preprocess']['result'] # {"output: "preprocess is done"}
    ...

4. Modify the part where the model is saved and loaded

Example) This guide depends on the type of model.

### If the model is not pickle ###
def train(pipeline: dict):
    ...
    # model save
    model_path = pipeline['model']['workspace']
    tf.save_model(model_path + "model.pb")
    ...
 
def inference(pipeline: dict):
    ...
    # model load
    model_path = pipeline['model']['workspace']
    load_model(model_path)
 
### If the model is pickle ###
def train(pipeline: dict):
    ...
    # model save
    pipeline['model']['file_name'] = model
    ...
 
def inference(pipeline: dict):
    ...
    # model load
    model = pipeline['model']['file_name']

5. Return the variables you want to save in the result

Example) inference function return

    return {
        'extraOutput': '',
        'summary': {
            'result': f"#survived:{num_survived} / #total:{num_total}",
            'score': round(survival_ratio, 3),
            'note': "Score means titanic survival ratio",
            'probability': {"dead": avg_proba_dead, "survived": avg_proba_survived}
        }
    }

6. (optional) allable arguments written in the experimental_plan.yaml file can be used in the function

Example) Usage of x_columns

### General usage ###
## titanic.py ##
def train(pipeline: dict, x_columns=[], y_column=None, n_estimators=100):
        ...
        X = pd.get_dummies(df[x_columns])
        ...
 
### Using x_columns written in yaml file ###
## experimental_plan.yaml ##
    train:
      def: titanic.train
      argument:
        x_columns: [ 'Pclass', 'Sex', 'SibSp', 'Parch']
        y_column: Survived
 
## titanic.py ##
def train(dict, pipeline: dict, x_columns=[], y_column=None, n_estimators=100)
        ...
        X = pd.get_dummies([pipeline['train']['argument']['x_columns']])
        ...

7. Any other cases where a file save path is required (v2: get_extra_output_path())

Example)

def train(pipeline: dict, x_columns=[], y_column=None, n_estimators=100):
        ... 
        pipeline['train']['external_path'] = 'True'
		...

Once completed, AI Solution users can directly execute the AI Solution through the 'alo run' CLI command.

Developing the Pipeline ​

1. Add a pipeline argument to the function definition​

2. Modify the part where specific paths are loaded in the existing code to be received through the pipeline​

3. To pass arguments between functions, use the return function​

4. Modify the part where the model is saved and loaded​

5. Return the variables you want to save in the result​

6. (optional) allable arguments written in the experimental_plan.yaml file can be used in the function​

7. Any other cases where a file save path is required (v2: get_extra_output_path())​