Regularization Techniques
Regularization prevents overfitting by adding constraints or noise to the training process. Deep learning uses specialized techniques beyond standard L1/L2.
L1 and L2 Regularization
Add penalty to the loss function: L1 (Lasso) encourages sparsity, L2 (Ridge) shrinks weights. In deep learning, L2 is often called weight decay.
# PyTorch (L2 via weight_decay)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)Dropout
Randomly drops a fraction of neurons during training, forcing the network to learn redundant representations. At test time, dropout is turned off (or scaled). Typical dropout rate: 0.2–0.5.
self.dropout = nn.Dropout(0.3)
x = self.dropout(x)Batch Normalization (BatchNorm)
Normalizes layer outputs to have zero mean and unit variance, reducing internal covariate shift. BatchNorm also acts as a regularizer, allowing higher learning rates.
self.bn = nn.BatchNorm2d(num_features)Early Stopping
Monitor validation loss and stop training when it stops improving. This prevents overfitting and saves time. Typically restore the best weights when validation loss was lowest.
Data Augmentation (as Regularization)
By generating synthetic variations of training data, augmentation reduces overfitting, especially for image classifiers.
Two Minute Drill
- L2 weight decay penalizes large weights.
- Dropout randomly drops neurons during training.
- BatchNorm normalizes layer outputs, speeds training.
- Early stopping halts training when validation loss plateaus.
Need more clarification?
Drop us an email at career@quipoinfotech.com
