TCR Release Note
v3.0.0
Apr. 29, 2025
Improvements
TCR v2.2.3 is now available in an ALO-v3 environment.
Compatibility: ALO v3.0.0
v2.2.3
Dec. 30, 2024
Bug fixes
- Fixed a bug where the sampling config that should be applied for each groupkey data was not being properly generated when using both the groupkey and the sampling option simultaneously.
Compatibility: ALO v2.7.0
v2.2.2
Nov. 11, 2024
Improvements
- Added a priority selection logic when the evaluation_metric values of the HPO result models are the same. For more details, please refer to the evaluation_metric parameter section on the 'TCR Parameter' page.
Bug fixes
- Fixed a bug where an indexing error occurs when using the oversampling feature if there are missing values in the y variable.
Compatibility: ALO v2.6.0
v2.2.1
Aug. 5, 2024
New Features
- Sampling Asset: Added the user argument 'random_state'.
- Using 'random_state', users can set the random seed to regenerate the sampling output. For more details, see the TCR Parameter document.
- Readiness Asset: Added the user argument 'report'.
- This allows users to turn on and off the generation of report.csv during readiness. For more details, see the TCR Parameter document.
Improvements
- Updated the value of inference_summary.yaml depending on the number of inference datasets.
- If there is one inference dataset, 'result' will be the prediction value and 'probability' will be the model prediction probability for each label in the data.
Bug fixes
- Fixed a bug in the readiness report (report.csv) when calculating statistics for numeric columns.
- Changed the logic and fixed a bug in checking columns that have all missing values.
Compatibility: ALO v2.5.2
v2.2
Jun. 20, 2024
New Features
- Adding a data summary CSV as an output of the readiness asset.
- report.csv: It provides the types of columns, cardinality, and value distribution of categorical data, statistics of numeric columns, and the number and ratio of missing values.
- Adding categorical encoding methods in the preprocessing asset.
- Binary and CatBoost encoding have been updated. CatBoost encoding doesn't need dummy variables, saving memory. Binary encoding also creates fewer dummy columns than one-hot encoding, the previous default method, making it more efficient.
- The default method for categorical encoding has been changed to binary encoding.
Improvements
- Change the logic in the user argument 'ignore_new_category'
- Previously, when 'ignore_new_category' was set to True, rows with unseen categorical data during inference were deleted. This logic has been changed to fill those rows with missing values instead.
- For CatBoost encoding, new categories can be handled directly, so instead of treating them as missing values, they are encoded as is.
- Change the default logic of the 'handle missing' function in the preprocessing asset.
- The logic has been changed to fill missing values regardless of the overall missing rate: categorical columns are filled with the most frequent value, and numerical columns are filled with the median value.
- Previously, if the overall missing rate was 10% or less, rows were dropped; if it exceeded 10%, missing values were filled. This threshold has now been removed.
Compatibility: ALO v2.5.1
v2.1.2
May. 27, 2024
New Features
- Providing data analysis results in csv for user convenience
- model_selection.csv: Records the performance of the model used in HPO
- eval_result.csv: Outputs the result score of classification/regression when there is a y label
Improvements
- Modified the logs of readiness asset for easier understanding
- Changed the score value of inference_summary.json to model reliability
Bug fixes
- Resolved column duplication issue occurring in specific data column formats
Compatibility: ALO v2.4.0
v2.1.1
Apr. 24, 2024
Bug fixes
- Bug fix in user arguments
- The 'drop' and 'fill' methods in the 'handle_missing' have been revised.
- The 'evaluation_metric' has been updated due to a bug in the R2 setting.
- Bug fix in Asset code
- The columns used in modeling, which were previously displayed in a transformed form that was difficult for users to understand, are now accurately shown in model_selection.json.
- The reserved word in Train/Inference asset has been updated to avoid overlapping with the user's column names.
- The code for splitting data for HPO has been revised in the Sampling asset.
Compatibility: ALO v2.3.2
v2.1
Apr. 15, 2024
New Features
- New Asset: Sampling for train data sampling based on labels
- A new asset 'Sampling' has been added to the TCR pipeline, which enables users to perform over-sampling and under-sampling based on the label of the 'y' column.
- The data split function that was previously in the train asset has been moved to the sampling asset. This change was made to adapt the sampling to split data for Hyperparameter Optimization (HPO).
- Add a multiprocessing on/off function in the Train asset
- Users can choose to use multiprocessing in Hyperparameter Optimization (HPO) by setting the 'multiprocessing' argument. The default setting is 'False', which means multiprocessing is not performed unless the user specifically sets 'multiprocessing' to 'True' in the YAML file.
Bug fixes
- Argument bug fix
- There was a bug fix in user argument 'column_types'
Compatibility: ALO v2.3.2
v2.0.2
Apr. 3, 2024
New Features
- Revise the Method of Loading Files in Input Asset
- The input asset retrieves all CSV files from the inner folder located in the path that the user specifies in the YAML file. Before TCR ML v2.0.2, the input asset retrieves CSV files right under the specified path
Compatibility: ALO v2.3.2
v2.0.1
Mar. 29, 2024
Bug fixes
- overall bug fix in Readiness, Preprocess, Train asset
- overall debugging in readiness, preprocess, train asset is conducted
- Limiting the Use of User Arguments in the Inference Pipeline**
- We are limiting the use of user arguments in the inference pipeline. If you attempt to insert user arguments into the inference pipeline, it will not function at all. Instead, user arguments in the inference pipeline will be sourced from the training pipeline
Improvements
- Add 'output_type' argument in the train asset
- If you want to get only the columns from modeling, change the 'output_type' value to 'simple'. Then, only modeling columns like pred_, prob_, data_split, and shap_ will appear in your output.csv
- Add 'num_cpu_core' argument in the train asset
- To control memory usage, we have introduced the 'num_cpu_core' argument in the train_asset. This argument limits the number of cores used in multiprocessing during the training process.
- 'origin_columns' is added in output.csv
- with origin_column, users can easily match the output.csv to their original data.
Compatibility: ALO v2.3.1
v2.0.0
Mar. 13, 2024
New Features
- Refactoring the TCR train/inference asset for improved code maintenance
- In this release, there was a refactoring of the train/inference asset, which is the main asset responsible for modeling in TCR. The code has been organized into separate folders based on functionality, and the model files are now managed within their own folder. This allows developers to easily debug the code and incorporate new models
- New Asset: Readiness for data filtering
- In TCR ML v2.0.0, a readiness asset has been added to the pipeline, enabling users to check the suitability of their input data for TCR before the modeling stage. Previously, it was possible to identify that the data was not suitable for TCR during the preprocess or modeling stages. With the readiness asset, users can check for data issues, allowing them to quickly identify and resolve any problems before proceeding with running TCR.
- New features in Preprocess asset: Groupkey and multiprocessing
- The groupkey feature has been newly added to the preprocess asset. The groupkey feature allows data to be grouped, enabling separate modeling for each group. Through this, data preprocessing can be done for each group separately.
- multiprocessing is added to preprocess asset which can reduce the process time when using groupkey feature.
Improvements
- Revise the Input Asset to simplify the functions
- The input asset now only has the functionality to load files and create a single dataframe, while the data validation feature has been transferred to readiness. To make it customizable for various data loading situation in developing AI solutions, the input asset has been templated. Users can now customize and use the input asset according to their needs.
- Minimize user arguments in the experimental_plan.yaml file
- In the past, users had to write many user arguments to run TCR. In TCR ML v2.0, we have minimized the number of user arguments that users must write, making it easier and faster for users to run TCR.
- We have added a user arguments guide to the TCR git. Through this guide, users can find explanations and usage instructions for default arguments, as well as a list of custom arguments. Users can apply these arguments to the experimental_plan.yaml file to utilize various features of TCR.
Compatibility: ALO v2.3