Loading

Quipoin Menu

Learn • Practice • Grow

deep-learning / DataLoaders in PyTorch
tutorial

DataLoaders in PyTorch

PyTorch provides `Dataset` and `DataLoader` to handle data loading, batching, shuffling, and parallel processing. This chapter shows how to load built‑in datasets and custom data.

Built‑in Datasets (torchvision)

import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])

trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

Custom Dataset

from torch.utils.data import Dataset

class MyDataset(Dataset):
def __init__(self, csv_file):
self.data = pd.read_csv(csv_file)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
row = self.data.iloc[idx]
x = torch.tensor(row[:-1], dtype=torch.float32)
y = torch.tensor(row[-1], dtype=torch.long)
return x, y

DataLoader Parameters

  • batch_size: number of samples per batch.
  • shuffle: shuffle data every epoch (for training).
  • num_workers: number of subprocesses for loading (increase for speed).
  • drop_last: drop last incomplete batch (for some models).


Two Minute Drill
  • `Dataset` defines how to access samples.
  • `DataLoader` batches, shuffles, and parallelizes.
  • Use torchvision for standard datasets (MNIST, CIFAR, ImageNet).
  • Custom datasets require `__len__` and `__getitem__`.

Need more clarification?

Drop us an email at career@quipoinfotech.com