Dataset
add_annotations(annotations)
Add "Annotation"s to dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
annotations |
list[Annotation]
|
list of "Annotation"s |
required |
add_categories(categories)
Add "Category"s to dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
categories |
list[Category]
|
list of "Category"s |
required |
add_images(images)
Add "Image"s to dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
list[Image]
|
list of "Image"s |
required |
add_predictions(predictions)
Add "Annotation"s to dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
annotations |
list[Annotation]
|
list of "Annotation"s |
required |
check_trainable()
Check if Dataset is trainable or not.
Raises:
Type | Description |
---|---|
ValueError
|
if dataset has not enough annotations. |
clone(src_name, name, src_root_dir=None, root_dir=None)
classmethod
Clone Existing Dataset. This method clones an existing dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src_name |
str
|
Dataset name to clone. It should be Waffle Created Dataset. |
required |
name |
str
|
New Dataset name |
required |
src_root_dir |
str
|
Source Dataset root directory. Defaults to None. |
None
|
root_dir |
str
|
New Dataset root directory. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileNotFoundError
|
if source dataset does not exist. |
FileExistsError
|
if new dataset name already exist. |
Examples:
>>> ds = Dataset.clone("my_dataset", "my_dataset_clone")
>>> ds.name
'my_dataset_clone' # cloned dataset name
>>> ds.task
'CLASSIFICATION' # original dataset task
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
delete()
Delete Dataset
dummy(name, task, image_num=100, category_num=10, unlabeled_image_num=0, root_dir=None)
classmethod
Create Dummy Dataset (for debugging).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name |
required |
task |
str
|
Dataset task |
required |
image_num |
int
|
Number of images. Defaults to 100. |
100
|
category_num |
int
|
Number of categories. Defaults to 10. |
10
|
unlabeld_image_num |
int
|
Number of unlabeled images. Defaults to 0. |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileExistsError
|
if dataset name already exists |
Examples:
>>> ds = Dataset.dummy("my_dataset", "CLASSIFICATION", image_num=100, category_num=10)
>>> len(ds.get_images())
100
>>> len(ds.get_categories())
10
export(data_type)
Export Dataset to Specific data formats
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_type |
Union[str, DataType]
|
export data type. one of ["YOLO", "COCO"]. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
if data_type is not one of DataType. |
Examples:
>>> dataset = Dataset.load("some_dataset")
>>> dataset.export(data_type="YOLO")
path/to/dataset_dir/exports/yolo
You can train with exported dataset
>>> hub.train("path/to/dataset_dir/exports/yolo", ...)
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
exported dataset directory |
from_autocare_dlt(name, task, coco_file, coco_root_dir, root_dir=None)
classmethod
Import dataset from autocare dlt format. This method is used for importing dataset from autocare dlt format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
name of dataset. |
required |
task |
str
|
task of dataset. |
required |
coco_file |
Union[str, list[str]]
|
coco annotation file path. |
required |
coco_root_dir |
Union[str, list[str]]
|
root directory of coco dataset. |
required |
root_dir |
str
|
root directory of dataset. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileExistsError
|
if new dataset name already exist. |
Examples:
Import one coco json file.
>>> ds = Dataset.from_coco("my_dataset", "object_detection", "path/to/coco.json", "path/to/coco_root")
>>> ds.get_images()
{<Image: 1>, <Image: 2>, <Image: 3>, <Image: 4>, <Image: 5>}
>>> ds.get_annotations()
{<Annotation: 1>, <Annotation: 2>, <Annotation: 3>, <Annotation: 4>, <Annotation: 5>}
>>> ds.get_categories()
{<Category: 1>, <Category: 2>, <Category: 3>, <Category: 4>, <Category: 5>}
>>> ds.get_category_names()
['person', 'bicycle', 'car', 'motorcycle', 'airplane']
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class. |
from_coco(name, task, coco_file, coco_root_dir, root_dir=None)
classmethod
Import Dataset from coco format. This method imports coco format dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name. |
required |
task |
str
|
Dataset task. |
required |
coco_file |
Union[str, list[str]]
|
Coco json file path. If given list, it will be regarded as [train, val, test] json file. |
required |
coco_root_dir |
Union[str, list[str]]
|
Coco image root directory. If given list, it will be regarded as [train, val, test] coco root file. |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileExistsError
|
if new dataset name already exist. |
Examples:
Import one coco json file.
>>> ds = Dataset.from_coco("my_dataset", "object_detection", "path/to/coco.json", "path/to/coco_root")
>>> ds.get_images()
{<Image: 1>, <Image: 2>, <Image: 3>, <Image: 4>, <Image: 5>}
>>> ds.get_annotations()
{<Annotation: 1>, <Annotation: 2>, <Annotation: 3>, <Annotation: 4>, <Annotation: 5>}
>>> ds.get_categories()
{<Category: 1>, <Category: 2>, <Category: 3>, <Category: 4>, <Category: 5>}
>>> ds.get_category_names()
['person', 'bicycle', 'car', 'motorcycle', 'airplane']
Import multiple coco json files.
You can give coco_file as list.
Given coco files are regarded as [train, [val, [test]]] json files.
>>> ds = Dataset.from_coco("my_dataset", "object_detection", ["coco_train.json", "coco_val.json"], ["coco_train_root", "coco_val_root"])
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
from_transformers(name, task, dataset_dir, root_dir=None)
classmethod
Import Dataset from transformers datasets. This method imports transformers dataset from directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name. |
required |
dataset_dir |
str
|
Transformers dataset directory. |
required |
task |
str
|
Task name. |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileExistsError
|
if dataset name already exists |
ValueError
|
if dataset is not Dataset or DatasetDict |
Examples:
>>> ds = Dataset.from_transformers("transformers", "object_detection", "path/to/transformers/dataset")
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
from_yolo(name, task, yaml_path, root_dir=None)
classmethod
Import Dataset from yolo format. This method imports dataset from yolo(ultralytics) yaml file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name. |
required |
task |
str
|
Dataset task. |
required |
yaml_path |
str
|
Yolo yaml file path. |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Example
ds = Dataset.from_yolo("yolo", "classification", "path/to/yolo.yaml")
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Imported dataset. |
get_annotations(image_id=None)
Get "Annotation"s.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_id |
int
|
image id. None for all "Annotation"s. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[Annotation]
|
list[Annotation]: "Annotation" list |
get_categories(category_ids=None)
Get "Category"s.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
category_ids |
list[int]
|
id list. None for all "Category"s. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[Category]
|
list[Category]: "Category" list |
get_dataset_info()
Get DatasetInfo.
Returns:
Name | Type | Description |
---|---|---|
DatasetInfo |
DatasetInfo
|
DatasetInfo |
get_dataset_list(root_dir=None)
classmethod
Get dataset name list in root_dir.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root_dir |
str
|
dataset root directory. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[str]
|
list[str]: dataset name list. |
get_images(image_ids=None, labeled=True)
Get "Image"s.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_ids |
list[int]
|
id list. None for all "Image"s. Defaults to None. |
None
|
labeled |
bool
|
get labeled images. False for unlabeled images. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
list[Image]
|
list[Image]: "Image" list |
get_predictions(image_id=None)
Get "Prediction"s.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
image_id |
int
|
image id. None for all "Prediction"s. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[Annotation]
|
list[Annotation]: "Prediction" list |
get_split_ids()
Get split ids
Returns:
Type | Description |
---|---|
list[list[int]]
|
list[list[int]]: split ids |
initialize()
Initialize Dataset. It creates necessary directories under {dataset_root_dir}/{dataset_name}.
initialized()
Check if Dataset has been initialized or not.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
initialized -> True not initialized -> False |
load(name, root_dir=None)
classmethod
Load Dataset. This method loads an existing dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name that Waffle Created |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileNotFoundError
|
if source dataset does not exist. |
Examples:
>>> ds = Dataset.load("my_dataset")
>>> ds.name
'my_dataset' # dataset name
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
merge(name, root_dir, src_names, src_root_dirs, task)
classmethod
Merge Datasets. This method merges multiple datasets into one dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
New Dataset name |
required |
root_dir |
str
|
New Dataset root directory |
required |
src_names |
list[str]
|
Source Dataset names |
required |
src_root_dirs |
Union[str, list[str]]
|
Source Dataset root directories |
required |
task |
str
|
Dataset task |
required |
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
new(name, task, root_dir=None)
classmethod
Create New Dataset. This method creates a new dataset directory and initialize dataset info file. If you have other types of data, you can use from_* methods to create a dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name |
required |
task |
str
|
Dataset task |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
FileExistsError
|
if dataset name already exists |
Examples:
>>> ds = Dataset.new("my_dataset", "CLASSIFICATION")
>>> ds.name
'my_dataset' # dataset name
>>> ds.task # dataset task
'CLASSIFICATION'
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
sample(name, task, root_dir=None)
classmethod
Import sample Dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Dataset name. |
required |
task |
str
|
Task name. |
required |
root_dir |
str
|
Dataset root directory. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
Dataset Class |
split(train_ratio, val_ratio=0.0, test_ratio=0.0, method=SplitMethod.RANDOM, seed=0)
Split Dataset to train, validation, test, (unlabeled) sets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_ratio |
float
|
train num ratio (0 ~ 1). |
required |
val_ratio |
float
|
val num ratio (0 ~ 1). |
0.0
|
test_ratio |
float
|
test num ratio (0 ~ 1). |
0.0
|
method |
Union[str, SplitMethod]
|
split method. Defaults to SplitMethod.RANDOM. |
SplitMethod.RANDOM
|
seed |
int
|
random seed. Defaults to 0. |
0
|
Raises:
Type | Description |
---|---|
ValueError
|
if train_ratio is not between 0.0 and 1.0. |
ValueError
|
if train_ratio + val_ratio + test_ratio is not 1.0. |
Examples:
>>> dataset = Dataset.load("some_dataset")
>>> dataset.split(train_ratio=0.8, val_ratio=0.1, test_ratio=0.1)
>>> dataset.get_split_ids()
[[1, 2, 3, 4, 5, 6, 7, 8], [9], [10], []] # train, val, test, unlabeled image ids
trainable()
Check if Dataset is trainable or not.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
trainable -> True not trainable -> False |