Skip to content

CombinedDataset

A combined dataset is a simple dataset that combines multiple underlying datasets. The combined dataset will return the items from the first underlying dataset, then the items from the second underlying dataset, and so on.

Creating a combined dataset
from loadax import CombinedDataset, SimpleDataset

dataset1 = SimpleDataset([1, 2, 3, 4, 5])
dataset2 = SimpleDataset([6, 7, 8, 9, 10])
combined_dataset = CombinedDataset(dataset1, dataset2)

for i in range(len(combined_dataset)):
    print(combined_dataset.get(i))

#> 1
#> 2
#> 3
#> 4
#> 5
#> 6
#> 7
#> 8
#> 9
#> 10

Bases: Dataset[Example], Generic[Example]

A dataset that combines two datasets sequentially.

This dataset type allows you to concatenate two datasets, creating a new dataset that contains all elements from the first dataset followed by all elements from the second dataset.

Parameters:

Name Type Description Default
dataset1 Dataset[Example]

The first dataset to be combined.

required
dataset2 Dataset[Example]

The second dataset to be combined.

required
Source code in src/loadax/dataset/combined_dataset.py
15
16
17
18
19
20
21
22
23
def __init__(self, dataset1: Dataset[Example], dataset2: Dataset[Example]):
    """Initialize the CombinedDataset.

    Args:
        dataset1: The first dataset to be combined.
        dataset2: The second dataset to be combined.
    """
    self.dataset1 = dataset1
    self.dataset2 = dataset2