Skip to main content
Version: Next

Tabular Anomaly Detection (TAD)

Updated 2024.07.06

What is Tabular Anomaly Detection?

  • TAD (Machine Learning Anomaly Detection) is a system that automatically detects anomalies in data using machine learning.
  • This system learns normal data patterns and identifies different patterns as anomalies.
  • It is effective in detecting new, unseen anomalies, contributing to risk management and stability across various industries.


When to use TAD?


TAD is useful in the following scenarios:

  • When you want to detect new anomalies.
  • When the training data consists only of normal data.
  • When you want to simplify the pipeline from data preprocessing to model development and deployment for anomaly detection.

TAD can be used for anomaly detection modeling in various domains, including:

  • Manufacturing: Predicting machine failures, quality control.
  • Finance: Detecting abnormal transactions, fraud prediction.
  • Healthcare: Detecting abnormal signs in patients.
  • Public Sector: Detecting abnormal behavior, crime prediction.

Key Features


  • AutoML Feature: Automatically finds the optimal model without the need for the user to select and adjust models.
  • Data Preprocessing: Provides various data preprocessing techniques to improve data quality.
  • Anomaly Detection: Effectively detects new anomalies based on normal data.
  • User-Friendliness: Allows users to input a few parameters and execute, creating the desired anomaly detection model for the input data.
  • Code-Free Modeling: Performs various preprocessing and modeling experiments automatically by inputting parameters in a YAML file.
  • Scalability: Users can add separate machine learning models to be used along with existing models.

Quick Start


Installation

Data Preparation

  • Prepare a CSV file containing columns of the data you want to detect anomalies in.

  • Each column value should be a float, and if there are empty or NaN values, the corresponding row will be automatically excluded. data.csv

    x_col_1x_col_2time_col(optional)grouupkey(optional)y_col(optional)
    value 1_1value 1_2time 1group1ok
    value 2_1value2_2time 2group2ok
    value 3_1value3_2time 3group1ng
    ...............

Required Parameter Settings

  1. Modify the following data paths in ad/experimental_plan.yaml. If only training is performed, load_train_inference_data_path does not need to be modified.

    external_path:
    - load_train_data_path: ./solution/sample_data/train/
    - load_inference_data_path: ./solution/sample_data/test/
  2. Enter x_columns and y_column(optional) that match the train data in the args of step: readiness.

    • x_columns: Enter the x column names of the user data in a list to use only those columns for model training.
    • y_column: Enter the y column name of the user data to use that column as a label. (If blank, it is considered to have no label)
    - step: readiness
    args:
    - x_columns: [x0, x1, x2, ...] # Enter the x column names of the user data for training
    - y_column: target # Enter the y column name of the user data (if blank, it is considered to have no label)

By setting only steps 1 and 2 and running ALO, you can create a TAD model.

=> For more advanced parameter settings to create a model that better fits your data, refer to the link on the right. Learn more: TAD Parameter

Execution

  • You can run it in a terminal or a Jupyter notebook. Learn more: Develop AI Solution
  • The execution results include a trained model file, prediction results, and performance charts.


Topics



TAD Version: 1.0.0, ALO Version: 2.5.2