Loading

Quipoin Menu

Learn • Practice • Grow

deep-learning / Data Preprocessing for DL
tutorial

Data Preprocessing for DL

Before feeding data into a deep learning model, proper preprocessing is crucial. This includes normalization, handling missing values, data augmentation, and creating validation splits.

Normalization (Feature Scaling)

Neural networks work best when input values are small, typically centered around zero with similar scales. Common methods:
  • Min‑Max scaling: x_scaled = (x – min) / (max – min) → range [0,1].
  • Standardization: x_scaled = (x – mean) / std → mean 0, std 1 (recommended for deep learning).
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Handling Missing Values

Deep learning models cannot handle NaN values. Options:
  • Remove rows with missing values (if few).
  • Impute with mean, median, or mode.
  • Use a mask or embedding to indicate missingness (advanced).

Data Augmentation

For image data, augmentation artificially increases dataset size by applying random transformations: rotation, flipping, zoom, brightness changes. This reduces overfitting and improves generalization.
from torchvision import transforms
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.2),
transforms.ToTensor()
])

Train / Validation / Test Split

Typical split: 70% train, 15% validation, 15% test. Use validation set for hyperparameter tuning, test set only once at the end.
from sklearn.model_selection import train_test_split
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)


Two Minute Drill
  • Normalize/standardize inputs for stable training.
  • Handle missing values by dropping or imputing.
  • Data augmentation improves generalization for images.
  • Split into train, validation, test sets.

Need more clarification?

Drop us an email at career@quipoinfotech.com