Version: Next

Experimentation

Updated 2024.05.05

Once the development of each Asset constituting the ML Pipeline is complete, review the experimental_plan.yaml and follow the guide below to run and experiment with the ML pipeline.

Topics

Running the AI Pipeline
Running with My Data

Running the AI Pipeline

python main.py
python main.py --mode train # Run only the train pipeline
python main.py --mode inference # Run only the inference pipeline
```  

When ALO's main.py is executed, it downloads the code for each Asset into the alo/asset folder from the git of asset_source defined in experimental_plan.yaml. <br/>
While downloading each asset, ALO installs the dependency packages specified under the requirements key in experimental_plan.yaml. <br/>
At this time, you can either write the package names directly, such as pandas==1.5.3, or write requirements.txt to install the packages listed in the requirements.txt in the git repository of the Asset.

If different versions of the same package need to be installed for different assets, package conflicts can occur. <br/>
(e.g., if the input asset needs pandas==1.5.3 and the train asset needs pandas==1.5.4) <br/>
In such cases, if an asset that comes later in the AI pipeline is specified with an option like pandas==1.5.4 --force-reinstall, it will attempt to reinstall version 1.5.4 even if pandas==1.5.3 is already installed. <br/>
(e.g., the train asset comes after the input asset) <br/>
Asset developers should prevent dependency package conflicts between higher-priority and lower-priority assets in this way and proceed with the overall development of AI Contents or AI Solutions.

**_Note:_** If you have changed AI Contents, create a new virtual environment such as Anaconda (or Pyenv+Pipenv) and run main.py to prevent conflicts between packages already installed in the virtual environment and newly installed packages.  

```yaml
#experimental_plan.yaml
asset_source:
    - train_pipeline:
        - step: input
          source:
            code: http://mod.lge.com/hub/smartdata/ml-framework/alov2-module/input.git
            branch: tabular_2.0
            requirements:
              - pandas==1.5.3

After installing the dependencies of each Asset, ALO copies the external data defined in external_path to the alo/input/train (or inference) path.
In the input Asset within the train or inference pipeline, the data path can be obtained through the self.asset.get_input_path() API, which returns the absolute path to the alo/input/train (or inference) folder.

Note: After solution registration, ALO is wrapped in a Docker Container in an operating environment such as AI Conductor, Edge App, etc., so it only accesses the promised local path called alo/input/train (or inference) within the Docker Container to retrieve input data. Therefore, the path to load input data must be obtained through the get_input_path() API call.

Running with My Data

In experimental_plan.yaml, set the data path in external_path and configure the user_parameters to suit your data. When creating an AI Solution based on AI Contents, AI Solution users must meet the recommended data specification conditions in the AI Solution developer's guide.
For example, if parameters such as x_columns and y_column are mandatory, they must be modified accordingly. Other user_parameters should be modified according to the data by referring to the guide written by the AI Solution developer.

Note: When writing paths such as data paths, if written as a relative path, the path of main.py serves as the base path.

#experimental_plan.yaml
external_path:
    - load_train_data_path: [./solution/sample_data/train/]
    - load_inference_data_path: ./solution/sample_data/inference/

Running the AI Pipeline ​

Running with My Data ​

Running the AI Pipeline

Running with My Data