Dimensionality Reduction (PCA)
Principal Component Analysis (PCA) reduces the number of features while preserving as much variance as possible. It projects data onto a lower‑dimensional space along directions of maximum variation.
PCA finds new axes (principal components) that capture the most variance in the data.
Why Reduce Dimensions?
- Visualize high‑dimensional data in 2D or 3D.
- Reduce noise and remove redundant features.
- Speed up training (fewer features).
- Mitigate overfitting (curse of dimensionality).
How PCA Works (Intuition)
Find direction (first principal component) where data varies most. Then find second direction orthogonal to first that captures remaining variance, and so on. Project data onto the top k components.
from sklearn.decomposition import PCA
pca = PCA(n_components=2) # reduce to 2 dimensions
X_pca = pca.fit_transform(X)
print(pca.explained_variance_ratio_) # variance captured by each componentChoosing Number of Components
Plot cumulative explained variance ratio. Choose enough components to reach 90‑95% variance.
pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_)
plt.plot(range(1,len(cumsum)+1), cumsum, marker='o')Important: Scale Features First
PCA is sensitive to feature scales. Always standardize (StandardScaler) before applying PCA.
Two Minute Drill
- PCA reduces dimensions while preserving variance.
- Use for visualization, noise reduction, speed.
- Scale features before PCA.
- Choose components by cumulative explained variance.
Need more clarification?
Drop us an email at career@quipoinfotech.com
