Loading

Quipoin Menu

Learn • Practice • Grow

machine-learning / Feature Scaling
tutorial

Feature Scaling

Features often have different scales. For example, age (0‑100) vs. income (20,000‑200,000). Distance‑based algorithms (k‑NN, SVM, neural networks) are sensitive to scale – larger numbers can dominate smaller ones. Feature scaling brings all features to a similar range.

Normalization (Min‑Max Scaling)

Scales values to a fixed range, usually [0, 1]. Formula: `(x – min) / (max – min)`.
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['age', 'income']])
Useful when features have bounded ranges (e.g., pixel intensities 0‑255).

Standardization (Z‑Score)

Centers data to mean 0 and standard deviation 1. Formula: `(x – mean) / std`.
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df[['age', 'income']])
Preferred for many algorithms (linear regression, SVM, PCA). Does not assume bounded range.

Which One to Choose?

  • Normalization (MinMaxScaler): When you need values in a fixed range (e.g., neural networks with sigmoid activation).
  • Standardization (StandardScaler): When features have outliers or you use PCA, SVM, linear regression.
  • For tree‑based models (Random Forest, XGBoost), scaling is not needed because they are not distance‑based.

Why Scaling Matters

Without scaling, a feature with large values (e.g., income) would dominate distance calculations, even if it is less important than age. Scaling ensures each feature contributes proportionally.


Two Minute Drill
  • Feature scaling brings all features to similar ranges.
  • Normalization (MinMaxScaler) → [0,1] range.
  • Standardization (StandardScaler) → mean 0, std 1.
  • Tree‑based models do not require scaling.

Need more clarification?

Drop us an email at career@quipoinfotech.com