Version: Next

Creating an AI Solution Without AI Contents

Updated 2024.05.05

If you find limitations in the application scope or performance when using AI Contents to solve your problem, you can develop new AI Contents. You can use and refer to the 'Titanic' example provided by ALO to create your own AI Contents, and once you become familiar with Asset development, you can develop from scratch starting from the Asset Template.

The 'Titanic' example provided by ALO offers practical guidelines necessary for effectively developing new AI Contents. By following this example, users can understand the process of creating AI Contents, apply it to their projects, and implement AI Contents in a short time.

Topics

Installing Titanic
Explanation of Titanic
Creating a New AI Solution

Installing Titanic

In the Titanic example, Assets form an ML pipeline in the shape of input - train for training, and input - inference - output for inference. The RandomForestClassifier model is saved in the train pipeline and loaded in the inference pipeline.

First, install Titanic soution from the path where ALO's main.py exists as follows:

git clone https://github.com/mellerikat/titanic.git solution

Explanation of Titanic

If you open the solution/sample_data of the Titanic example, you will see train_data and inference_data, each containing one train.csv and one test.csv, respectively.

./solution/
    └ sample_data
        └ train_data
            └ train.csv
        └ inference_data
            └ test.csv

    └ experimental_plan.yaml

Each csv file contains information about the passengers of the Titanic, and train.csv has an additional column called Survived, which is used as the training label. A Survived value of 1 means the passenger survived, while 0 means they did not. After downloading the titanic git into the solution folder and running python main.py, an assets folder is created, and under it, Asset folders named input, train, inference, and output are installed according to the configuration in solution/experimental_plan.yaml. Each Asset plays the following main roles.

- input

The input Asset copies data folders from the paths specified in load_train_data_path or load_inference_data_path of external_path in experimental_plan.yaml to the input/train (or inference) folder via ALO. It then reads the data from the paths received through the self.asset.get_input_path() API. There is a folder under the default input path that has the same name as the last folder of the external path specified by the user, and write code so that the user can read data from that path.

In the Titanic example, the next step is to read one csv file into a dataframe and assign it to the self.data dictionary variable using a key like self.data['dataframe0']. Additionally, the self.config dictionary variable includes x_columns and y_column specified in the args section of user_paramters in experimental_plan.yaml, which are then passed to the next Asset. Data and config are passed to the next Asset using the APIs self.asset.save_data(self.data) and self.asset.save_config(self.config), respectively.

Note: If you delete the keys of data and config received from the previous Asset in a specific Asset, ALO will generate an error. This prevents the corruption of outputs created by previous Assets since Asset developers may be different.

- train

In the train Asset of the Titanic example, data received from the input Asset is loaded using the self.asset.load_data() API, and config is loaded using the self.asset.load_config() API. x_columns and y_column to be used in the dataframe are then extracted to train a simple scikit-learn RandomForestClassifier model. The TITANIC Class, which defines the training and inference functions of the model, is specified in titanic_source.py and is used in both the train and inference Assets.

The hyper-parameters of RandomForestClassifier are numerous, but n_estimators, which represents the number of trees, is specified as an argument in experimental_plan.yaml. The train Asset loads this argument using the self.asset.load_args() API and inserts it into the model parameters.

After training is completed, the model is saved as a file named random_forest_model.pkl in the model path. The model path can be received through the self.asset.get_model_path() API, and the file is saved in that path. Logging and error handling can also be performed using APIs like self.asset.save_info(), self.asset.save_warning(), and self.asset.save_error().

- inference

The inference Asset proceeds similarly to the train Asset but calls inference functions instead of training to obtain the predicted classification value, predicted_class, and the probability value of the prediction, predict_proba. To create an inference summary in the output Asset, the data is assigned a key named output, which contains the concatenated original input dataframe and the predicted_class dataframe, and passed to the output Asset. predict_proba is also assigned a key named probability and passed to the output Asset.

Note: Why concatenate the original input dataframe to predicted_class? The output.csv saved in the output path obtained through self.asset.get_output_path() will be shown as a table in the EdgeConductor UI as inference results. AI Solution users can use these inference results directly to request retraining. Thus, to support retraining, the original input data format is maintained by concatenating it.

- output

The output Asset primarily saves the output file and creates the inference summary. The output file is saved in the path received through self.asset.get_output_path(). This output file should exist in one of three forms: one csv file, one jpg (or png, svg) file, or one of each type. Any other output file configuration will cause an error in ALO. This is a constraint to support retraining in EdgeConductor.

If there is a need to save multiple extra output files for external dashboards or data storage, they should be saved in the path received through the self.asset.get_extra_output_path() API. Additionally, the inference summary file must be created using the self.asset.save_summary() API. This file is necessary for checking inference results on the Mellerikat platform. For API explanations, refer to the Appendix: What is ALO API page.

Creating a New AI Solution

Once you have a good understanding of the Titanic example, follow the pages below in order to develop new AI Contents:

Installing Titanic ​

Explanation of Titanic ​

- input​

- train​

- inference​

- output​

Creating a New AI Solution ​