Loading

Quipoin Menu

Learn • Practice • Grow

machine-learning / Dimensionality Reduction (PCA)
tutorial

Dimensionality Reduction (PCA)

Principal Component Analysis (PCA) reduces the number of features while preserving as much variance as possible. It projects data onto a lower‑dimensional space along directions of maximum variation.

PCA finds new axes (principal components) that capture the most variance in the data.

Why Reduce Dimensions?

  • Visualize high‑dimensional data in 2D or 3D.
  • Reduce noise and remove redundant features.
  • Speed up training (fewer features).
  • Mitigate overfitting (curse of dimensionality).

How PCA Works (Intuition)

Find direction (first principal component) where data varies most. Then find second direction orthogonal to first that captures remaining variance, and so on. Project data onto the top k components.

from sklearn.decomposition import PCA

pca = PCA(n_components=2) # reduce to 2 dimensions
X_pca = pca.fit_transform(X)

print(pca.explained_variance_ratio_) # variance captured by each component

Choosing Number of Components

Plot cumulative explained variance ratio. Choose enough components to reach 90‑95% variance.
pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_)
plt.plot(range(1,len(cumsum)+1), cumsum, marker='o')

Important: Scale Features First

PCA is sensitive to feature scales. Always standardize (StandardScaler) before applying PCA.


Two Minute Drill
  • PCA reduces dimensions while preserving variance.
  • Use for visualization, noise reduction, speed.
  • Scale features before PCA.
  • Choose components by cumulative explained variance.

Need more clarification?

Drop us an email at career@quipoinfotech.com