Version: Next

AI Contents Experiment

Updated 2024.05.05

If you have installed ALO and AI Contents, you can experiment by modifying the various features provided by AI Contents to suit your data.

Topics

Running the ML Pipeline
Running with My Data

Running the ML Pipeline

python main.py
python main.py --mode train # Run only the train pipeline
python main.py --mode inference # Run only the inference pipeline

When ALO's main.py is executed, it downloads the code for each asset into the alo/asset folder from the git of asset_source defined in experimental_plan.yaml.
As each asset is downloaded, the dependencies specified under the requirements key are installed. This can be specified directly, like pandas==1.5.3, or as requirements.txt, which installs packages listed in the requirements.txt of the asset's git repository.
There may be conflicts if different versions of the same package are required by different assets. For instance, if the input asset needs pandas==1.5.3 and the train asset needs pandas==1.5.4, you can resolve this by appending options like pandas==1.5.4 --force-reinstall to the package name of the later asset in the ML pipeline, ensuring the newer version is installed.
AI Solution developers should handle these dependency conflicts between higher and lower priority assets while modifying or using AI Contents.

Note: If you switch from one AI Content (A) to another (B), you should create a new Python 3.10 virtual environment using tools like Anaconda or Pyenv + Pipenv. Running main.py in a fresh environment prevents conflicts between already installed packages and the new ones.

# experimental_plan.yaml
asset_source:
    - train_pipeline:
        - step: input
          source:
            code: {input Asset git address}
            branch: tabular_2.0
            requirements:
              - pandas==1.5.3
              - requirements.txt

Once the dependencies of each asset are installed, ALO copies the external data defined in the external_path of experimental_plan.yaml to alo/input/train (or inference). Assets that load data within the train or inference pipeline can get the data path using the asset.get_input_path() API, which returns the absolute path to alo/input/train (or inference).

Running with My Data

In experimental_plan.yaml, specify the path to the data to be imported in external_path and configure the user parameters of each asset in AI Contents to suit the data.
To build an AI Solution based on AI Contents, you need to meet the recommended data specification requirements. For instance, if there are mandatory parameters like x_columns and y_column, you need to adjust these parameters accordingly.
You can refer to the AI Contents guide to modify each user_parameter to fit your data.

Note: When specifying data paths in external_path, you can use absolute paths or relative paths. For relative paths, the path of main.py in ALO serves as the base path. If there are multiple external data paths, list them as a list type.

# experimental_plan.yaml
external_path:
    - load_train_data_path: ["./solution/sample_data/train1", "./solution/sample_data/train2"]
    - load_inference_data_path: ./solution/sample_data/inference

Running the ML Pipeline ​

Running with My Data ​

Running the ML Pipeline

Running with My Data