Skip to main content
Version: Next

Dataset

Updated 2025.02.18

Datasets are crucial tools for training AI models, allowing users to manage and utilize various forms of data. Training datasets are essential for creating accurate models, so the system provides users with various tools to configure and utilize datasets. The generated datasets can be selected by users when training in the stream menu and can be used as training data for one or more datasets.

edges


Sources

Users can create datasets from various sources, including Edge inference result data, PC file uploads, and objects located in S3. For S3 objects, the files are not copied into the Edge Conductor system but used as training data directly from S3, meaning no additional storage space is allocated within Edge Conductor.

Data Types

Both structured and unstructured data types are supported. When creating a dataset, users select a solution, and the selected solution determines the supported data types. For structured data, users can visually verify the data in table form, while for unstructured image data, an image viewer is provided.

Solution

Each dataset must be associated with a single AI solution. When selecting training datasets in the stream model training process, datasets mapped to the same AI solution can be selected together. This means inference data from different Edges with the same AI solution can be used together as training data.

Editing

Datasets store one or more data files, and users can delete files even after dataset creation.

Relabeling

Users can verify the data in the dataset and utilize the relabeling function, which is limited to image data. By correcting incorrectly tagged training data, users can maintain highly accurate datasets to ensure reliable model performance. The system provides manual labels and the last modification date, allowing users to check the tagging history. Each data entry includes a score value, enabling users to prioritize and selectively label data with low scores. If necessary, users can use the dataset cloning feature. If a dataset has already been used for model training, users might want to maintain it without changes. The "Save As" function in the relabel menu allows users to clone a dataset to maintain its build usage status without modifications.


Topics