Loading

Quipoin Menu

Learn • Practice • Grow

machine-learning / Cross-Validation
tutorial

Cross-Validation

Cross‑validation is a robust technique for evaluating model performance by splitting data into multiple training/validation folds. It reduces the variance of performance estimates and helps detect overfitting.

K‑fold cross‑validation divides data into k folds, trains on k‑1 folds, validates on the remaining fold, and repeats.

K‑Fold Cross‑Validation (Standard)

1. Shuffle and split data into k equal folds.
2. For each fold i: train on all folds except i, validate on fold i.
3. Average the k validation scores.
from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X, y, cv=5) # 5-fold
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")

Stratified K‑Fold (for Classification)

Preserves class proportions in each fold. Use for imbalanced datasets.
from sklearn.model_selection import StratifiedKFold, cross_val_score
skf = StratifiedKFold(n_splits=5)
scores = cross_val_score(model, X, y, cv=skf)

When to Use Which?

  • K‑fold: default choice, works well for most cases.
  • Stratified K‑fold: for classification with imbalanced classes.
  • Leave‑One‑Out (LOO): k = n, very computationally expensive.
  • TimeSeriesSplit: for time series data (no shuffling).


Two Minute Drill
  • Cross‑validation reduces variance in performance estimates.
  • K‑fold (k=5 or 10) is standard.
  • Stratified K‑fold preserves class balance.
  • Use cross_val_score for easy implementation.

Need more clarification?

Drop us an email at career@quipoinfotech.com