Dataloader¶
The Dataloader is the main interface for loading data into your training loop. The Dataloader is responsible for defining how to efficiently load data from a dataset and allocate it to the appropriate devices, batches, and all of the other features that make up proper data loading.
The Dataloader works by spawning background workers to prefetch data into a cache, and then filling batches from the cache as they become available. The use of background workers allows the dataloader to be highly efficient and not block the main thread, which is important for training loops. Loadax takes care of the parllelization details for you, so your dataloading is fast, reliable, and simple. The background cache will load out of order, as utilizes mutlithreading to load data in parallel, however the actual batches will be in order. This is because loadax ensures deterministic ordering of batches, and the background workers will load batches in the order that they are requested.
from loadax import Dataloader, SimpleDataset
dataset = SimpleDataset([1, 2, 3, 4, 5])
dataloader = Dataloader(
dataset=dataset,
batch_size=2,
num_workers=2,
prefetch_factor=2,
)
for batch in dataloader:
print(batch)
#> [1, 2]
#> [3, 4]
#> [5]
Bases: Generic[Example]
Dataloader that loads batches in the background or synchronously.
Example
from loadax.experimental.dataset.simple import SimpleDataset
from loadax.experimental.loader import Dataloader
dataset = SimpleDataset([1, 2, 3, 4, 5])
dataloader = Dataloader(
dataset=dataset,
batch_size=2,
num_workers=2,
prefetch_factor=2,
drop_last=False,
)
for batch in dataloader:
print(batch)
#> [1, 2]
#> [3, 4]
#> [5]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Dataset
|
The dataset to load data from. |
required |
batch_size |
int
|
The size of each batch. |
required |
num_workers |
int
|
The number of workers to use for parallel data loading. If 0, data will be loaded synchronously. |
0
|
prefetch_factor |
int
|
The prefetch factor to use for prefetching. If 0, no prefetching will occur. |
0
|
drop_last |
bool
|
Whether to drop the last incomplete batch. |
False
|
Source code in src/loadax/dataloader/loader.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
Bases: Generic[Example]
Iterator for the dataloader.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataloader |
Dataloader
|
The dataloader to iterate over. |
required |
Source code in src/loadax/dataloader/loader.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
progress ¶
progress() -> Progress
Get the progress of the dataloader.
Source code in src/loadax/dataloader/loader.py
105 106 107 108 109 |
|