Skip to content

Dataset

add_annotations(annotations)

Add "Annotation"s to dataset.

Parameters:

Name Type Description Default
annotations list[Annotation]

list of "Annotation"s

required

add_categories(categories)

Add "Category"s to dataset.

Parameters:

Name Type Description Default
categories list[Category]

list of "Category"s

required

add_images(images)

Add "Image"s to dataset.

Parameters:

Name Type Description Default
images list[Image]

list of "Image"s

required

add_predictions(predictions)

Add "Annotation"s to dataset.

Parameters:

Name Type Description Default
annotations list[Annotation]

list of "Annotation"s

required

check_trainable()

Check if Dataset is trainable or not.

Raises:

Type Description
ValueError

if dataset has not enough annotations.

clone(src_name, name, src_root_dir=None, root_dir=None) classmethod

Clone Existing Dataset. This method clones an existing dataset.

Parameters:

Name Type Description Default
src_name str

Dataset name to clone. It should be Waffle Created Dataset.

required
name str

New Dataset name

required
src_root_dir str

Source Dataset root directory. Defaults to None.

None
root_dir str

New Dataset root directory. Defaults to None.

None

Raises:

Type Description
FileNotFoundError

if source dataset does not exist.

FileExistsError

if new dataset name already exist.

Examples:

>>> ds = Dataset.clone("my_dataset", "my_dataset_clone")
>>> ds.name
'my_dataset_clone'  # cloned dataset name
>>> ds.task
'CLASSIFICATION'   # original dataset task

Returns:

Name Type Description
Dataset Dataset

Dataset Class

delete()

Delete Dataset

dummy(name, task, image_num=100, category_num=10, unlabeled_image_num=0, root_dir=None) classmethod

Create Dummy Dataset (for debugging).

Parameters:

Name Type Description Default
name str

Dataset name

required
task str

Dataset task

required
image_num int

Number of images. Defaults to 100.

100
category_num int

Number of categories. Defaults to 10.

10
unlabeld_image_num int

Number of unlabeled images. Defaults to 0.

required
root_dir str

Dataset root directory. Defaults to None.

None

Raises:

Type Description
FileExistsError

if dataset name already exists

Examples:

>>> ds = Dataset.dummy("my_dataset", "CLASSIFICATION", image_num=100, category_num=10)
>>> len(ds.get_images())
100
>>> len(ds.get_categories())
10

export(data_type)

Export Dataset to Specific data formats

Parameters:

Name Type Description Default
data_type Union[str, DataType]

export data type. one of ["YOLO", "COCO"].

required

Raises:

Type Description
ValueError

if data_type is not one of DataType.

Examples:

>>> dataset = Dataset.load("some_dataset")
>>> dataset.export(data_type="YOLO")
path/to/dataset_dir/exports/yolo

You can train with exported dataset

>>> hub.train("path/to/dataset_dir/exports/yolo", ...)

Returns:

Name Type Description
str str

exported dataset directory

from_autocare_dlt(name, task, coco_file, coco_root_dir, root_dir=None) classmethod

Import dataset from autocare dlt format. This method is used for importing dataset from autocare dlt format.

Parameters:

Name Type Description Default
name str

name of dataset.

required
task str

task of dataset.

required
coco_file Union[str, list[str]]

coco annotation file path.

required
coco_root_dir Union[str, list[str]]

root directory of coco dataset.

required
root_dir str

root directory of dataset. Defaults to None.

None

Raises:

Type Description
FileExistsError

if new dataset name already exist.

Examples:

Import one coco json file.

>>> ds = Dataset.from_coco("my_dataset", "object_detection", "path/to/coco.json", "path/to/coco_root")
>>> ds.get_images()
{<Image: 1>, <Image: 2>, <Image: 3>, <Image: 4>, <Image: 5>}
>>> ds.get_annotations()
{<Annotation: 1>, <Annotation: 2>, <Annotation: 3>, <Annotation: 4>, <Annotation: 5>}
>>> ds.get_categories()
{<Category: 1>, <Category: 2>, <Category: 3>, <Category: 4>, <Category: 5>}
>>> ds.get_category_names()
['person', 'bicycle', 'car', 'motorcycle', 'airplane']

Returns:

Name Type Description
Dataset Dataset

Dataset Class.

from_coco(name, task, coco_file, coco_root_dir, root_dir=None) classmethod

Import Dataset from coco format. This method imports coco format dataset.

Parameters:

Name Type Description Default
name str

Dataset name.

required
task str

Dataset task.

required
coco_file Union[str, list[str]]

Coco json file path. If given list, it will be regarded as [train, val, test] json file.

required
coco_root_dir Union[str, list[str]]

Coco image root directory. If given list, it will be regarded as [train, val, test] coco root file.

required
root_dir str

Dataset root directory. Defaults to None.

None

Raises:

Type Description
FileExistsError

if new dataset name already exist.

Examples:

Import one coco json file.

>>> ds = Dataset.from_coco("my_dataset", "object_detection", "path/to/coco.json", "path/to/coco_root")
>>> ds.get_images()
{<Image: 1>, <Image: 2>, <Image: 3>, <Image: 4>, <Image: 5>}
>>> ds.get_annotations()
{<Annotation: 1>, <Annotation: 2>, <Annotation: 3>, <Annotation: 4>, <Annotation: 5>}
>>> ds.get_categories()
{<Category: 1>, <Category: 2>, <Category: 3>, <Category: 4>, <Category: 5>}
>>> ds.get_category_names()
['person', 'bicycle', 'car', 'motorcycle', 'airplane']

Import multiple coco json files.

You can give coco_file as list.

Given coco files are regarded as [train, [val, [test]]] json files.

>>> ds = Dataset.from_coco("my_dataset", "object_detection", ["coco_train.json", "coco_val.json"], ["coco_train_root", "coco_val_root"])

Returns:

Name Type Description
Dataset Dataset

Dataset Class

from_transformers(name, task, dataset_dir, root_dir=None) classmethod

Import Dataset from transformers datasets. This method imports transformers dataset from directory.

Parameters:

Name Type Description Default
name str

Dataset name.

required
dataset_dir str

Transformers dataset directory.

required
task str

Task name.

required
root_dir str

Dataset root directory. Defaults to None.

None

Raises:

Type Description
FileExistsError

if dataset name already exists

ValueError

if dataset is not Dataset or DatasetDict

Examples:

>>> ds = Dataset.from_transformers("transformers", "object_detection", "path/to/transformers/dataset")

Returns:

Name Type Description
Dataset Dataset

Dataset Class

from_yolo(name, task, yaml_path, root_dir=None) classmethod

Import Dataset from yolo format. This method imports dataset from yolo(ultralytics) yaml file.

Parameters:

Name Type Description Default
name str

Dataset name.

required
task str

Dataset task.

required
yaml_path str

Yolo yaml file path.

required
root_dir str

Dataset root directory. Defaults to None.

None
Example

ds = Dataset.from_yolo("yolo", "classification", "path/to/yolo.yaml")

Returns:

Name Type Description
Dataset Dataset

Imported dataset.

get_annotations(image_id=None)

Get "Annotation"s.

Parameters:

Name Type Description Default
image_id int

image id. None for all "Annotation"s. Defaults to None.

None

Returns:

Type Description
list[Annotation]

list[Annotation]: "Annotation" list

get_categories(category_ids=None)

Get "Category"s.

Parameters:

Name Type Description Default
category_ids list[int]

id list. None for all "Category"s. Defaults to None.

None

Returns:

Type Description
list[Category]

list[Category]: "Category" list

get_dataset_info()

Get DatasetInfo.

Returns:

Name Type Description
DatasetInfo DatasetInfo

DatasetInfo

get_dataset_list(root_dir=None) classmethod

Get dataset name list in root_dir.

Parameters:

Name Type Description Default
root_dir str

dataset root directory. Defaults to None.

None

Returns:

Type Description
list[str]

list[str]: dataset name list.

get_images(image_ids=None, labeled=True)

Get "Image"s.

Parameters:

Name Type Description Default
image_ids list[int]

id list. None for all "Image"s. Defaults to None.

None
labeled bool

get labeled images. False for unlabeled images. Defaults to True.

True

Returns:

Type Description
list[Image]

list[Image]: "Image" list

get_predictions(image_id=None)

Get "Prediction"s.

Parameters:

Name Type Description Default
image_id int

image id. None for all "Prediction"s. Defaults to None.

None

Returns:

Type Description
list[Annotation]

list[Annotation]: "Prediction" list

get_split_ids()

Get split ids

Returns:

Type Description
list[list[int]]

list[list[int]]: split ids

initialize()

Initialize Dataset. It creates necessary directories under {dataset_root_dir}/{dataset_name}.

initialized()

Check if Dataset has been initialized or not.

Returns:

Name Type Description
bool bool

initialized -> True not initialized -> False

load(name, root_dir=None) classmethod

Load Dataset. This method loads an existing dataset.

Parameters:

Name Type Description Default
name str

Dataset name that Waffle Created

required
root_dir str

Dataset root directory. Defaults to None.

None

Raises:

Type Description
FileNotFoundError

if source dataset does not exist.

Examples:

>>> ds = Dataset.load("my_dataset")
>>> ds.name
'my_dataset'  # dataset name

Returns:

Name Type Description
Dataset Dataset

Dataset Class

merge(name, root_dir, src_names, src_root_dirs, task) classmethod

Merge Datasets. This method merges multiple datasets into one dataset.

Parameters:

Name Type Description Default
name str

New Dataset name

required
root_dir str

New Dataset root directory

required
src_names list[str]

Source Dataset names

required
src_root_dirs Union[str, list[str]]

Source Dataset root directories

required
task str

Dataset task

required

Returns:

Name Type Description
Dataset Dataset

Dataset Class

new(name, task, root_dir=None) classmethod

Create New Dataset. This method creates a new dataset directory and initialize dataset info file. If you have other types of data, you can use from_* methods to create a dataset.

Parameters:

Name Type Description Default
name str

Dataset name

required
task str

Dataset task

required
root_dir str

Dataset root directory. Defaults to None.

None

Raises:

Type Description
FileExistsError

if dataset name already exists

Examples:

>>> ds = Dataset.new("my_dataset", "CLASSIFICATION")
>>> ds.name
'my_dataset'  # dataset name
>>> ds.task  # dataset task
'CLASSIFICATION'

Returns:

Name Type Description
Dataset Dataset

Dataset Class

sample(name, task, root_dir=None) classmethod

Import sample Dataset.

Parameters:

Name Type Description Default
name str

Dataset name.

required
task str

Task name.

required
root_dir str

Dataset root directory. Defaults to None.

None

Returns:

Name Type Description
Dataset Dataset

Dataset Class

split(train_ratio, val_ratio=0.0, test_ratio=0.0, method=SplitMethod.RANDOM, seed=0)

Split Dataset to train, validation, test, (unlabeled) sets.

Parameters:

Name Type Description Default
train_ratio float

train num ratio (0 ~ 1).

required
val_ratio float

val num ratio (0 ~ 1).

0.0
test_ratio float

test num ratio (0 ~ 1).

0.0
method Union[str, SplitMethod]

split method. Defaults to SplitMethod.RANDOM.

SplitMethod.RANDOM
seed int

random seed. Defaults to 0.

0

Raises:

Type Description
ValueError

if train_ratio is not between 0.0 and 1.0.

ValueError

if train_ratio + val_ratio + test_ratio is not 1.0.

Examples:

>>> dataset = Dataset.load("some_dataset")
>>> dataset.split(train_ratio=0.8, val_ratio=0.1, test_ratio=0.1)
>>> dataset.get_split_ids()
[[1, 2, 3, 4, 5, 6, 7, 8], [9], [10], []]  # train, val, test, unlabeled image ids

trainable()

Check if Dataset is trainable or not.

Returns:

Name Type Description
bool bool

trainable -> True not trainable -> False