Clustering with K-Means
Clustering is an unsupervised learning task where we group similar data points together without any labels. K‑means is the most popular clustering algorithm.
K‑means partitions data into k clusters, each represented by its centroid (average point).
How K‑means Works
1. Choose k (number of clusters).
2. Randomly initialize k centroids.
3. Assign each point to the nearest centroid.
4. Update centroids as mean of assigned points.
5. Repeat steps 3‑4 until convergence (centroids stop moving).
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_Choosing k – The Elbow Method
Compute inertia (sum of squared distances to nearest centroid) for different k. Plot inertia vs k – look for an "elbow" where inertia stops decreasing rapidly.
inertias = []
for k in range(1,11):
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
inertias.append(kmeans.inertia_)
plt.plot(range(1,11), inertias, marker='o')When to Use K‑means
Works well when clusters are spherical and well‑separated. Not good for irregular shapes or different densities. Scale features before using K‑means.
Two Minute Drill
- K‑means groups data into k clusters.
- Use elbow method to choose k.
- Scale features before clustering.
- Works best for spherical clusters.
Need more clarification?
Drop us an email at career@quipoinfotech.com
