Feature Scaling

Features often have different scales. For example, age (0‑100) vs. income (20,000‑200,000). Distance‑based algorithms (k‑NN, SVM, neural networks) are sensitive to scale – larger numbers can dominate smaller ones. Feature scaling brings all features to a similar range.

Normalization (Min‑Max Scaling)

Scales values to a fixed range, usually [0, 1]. Formula: `(x – min) / (max – min)`.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['age', 'income']])

Useful when features have bounded ranges (e.g., pixel intensities 0‑255).

Standardization (Z‑Score)

Centers data to mean 0 and standard deviation 1. Formula: `(x – mean) / std`.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df[['age', 'income']])

Preferred for many algorithms (linear regression, SVM, PCA). Does not assume bounded range.

Which One to Choose?

Normalization (MinMaxScaler): When you need values in a fixed range (e.g., neural networks with sigmoid activation).
Standardization (StandardScaler): When features have outliers or you use PCA, SVM, linear regression.
For tree‑based models (Random Forest, XGBoost), scaling is not needed because they are not distance‑based.

Why Scaling Matters

Without scaling, a feature with large values (e.g., income) would dominate distance calculations, even if it is less important than age. Scaling ensures each feature contributes proportionally.

Two Minute Drill

Feature scaling brings all features to similar ranges.
Normalization (MinMaxScaler) → [0,1] range.
Standardization (StandardScaler) → mean 0, std 1.
Tree‑based models do not require scaling.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Feature Scaling

Need more clarification?