Manage Dataset
Datasets are essential tools for training AI models, allowing for the management and utilization of various data types from multiple sources. In Edge Conductor, a Dataset is a collection of multiple data files, which can be in the form of images (.jpg) or tabular files (.csv). A Dataset is a group of similar data files intended for use as training data for an AI Solution's model, and therefore, each Dataset must be associated with a specific AI Solution.
Create Dataset
Dataset → New Dataset
Create a Dataset by selecting an AI Solution and a data source.
Select AI Solution
When creating a Dataset, select an AI Solution. The AI Solution defines the data type and whether relabeling is supported, requiring data in the format specified in the data specification. A Dataset cannot be shared among multiple solutions; it must match the attributes of the AI Solution exactly for the AI model to function correctly. However, multiple Datasets can be associated with the same Solution, sharing the same data specifications and being selected together during training.
Data Source
The system currently supports the following data sources(Edge, Local, S3) for creating Datasets:
Edge
Collect inference results from Edges to create a Dataset, enabling model training with real-world data. Users can gather data from one or more Edges and within specific date ranges. Data can only be collected from Edges running the same solution as the Dataset. Ensure the solution information matches if the desired Edge is not listed.
Local
Upload data stored on a local PC to create a Dataset, allowing AI Operators to efficiently use their own data. The required file format is specified in the Dataset specification.
S3
Link data stored in AWS S3 to create a Dataset, using the data directly from S3 without allocating additional storage space in Edge Conductor. This allows efficient management and utilization of large data volumes. Currently, only tabular file formats are supported, with plans to support additional formats and multiple files or folders in the future. Ensure Edge Conductor has access permissions to the S3 bucket and prefix, configured during system installation. Contact the system administrator for permission issues.
To create a dataset
- Select "Dataset" from the Navigator Bar.
- Click "New Dataset" in the top right corner.
- Enter meta information like Name and Tag. Ensure the Name is distinct to avoid confusion and use Tags for easier searching.
- Select the appropriate AI Solution matching the Dataset's purpose. Available features in the Dataset depend on the selected AI Solution.
- Choose the Data Source:
- Edge: Collect inference results from Edges to create the Dataset.
- Local: Upload files from a local PC to create the Dataset.
- AWS S3: Link files stored in AWS S3.
- Follow the system guide provided for the chosen Data Source to select files for the Dataset.
Edit(Delete) Dataset
Dataset → Actions → Edit(Delete) Dataset
Edit a created Dataset to rename it or remove specific files. Datasets no longer in use can be deleted.
To edit(delete) dataset
- Select "Dataset" from the Navigator Bar.
- Review the registered datasets and choose the one to edit(delete).
- Select "Actions" and choose edit(delete).
Dataset Specification
Edge Conductor's Datasets support Tabular (.csv) or Image (.jpg) files. AWS S3 Link method can store data outside of these specifications. For non-spec data, use AWS S3 to create the Dataset.
Tabular Data
- File Extension: CSV
- Encoding: UTF-8
- Max size of a file: 300 MB
- Upload format: a zip file of CSV files (Each CSV file should have the same format)
Image Data
- File Extension: JPG, JPEG, PNG
- Upload format: a zip file of image files, with labeled folders
- Example
./{label1}/
└ image1.png
└ image1.jpeg
└ image2.jpeg
./{label2}/
└ image1.png
└ image1.jpeg