Clustering with K-Means

Clustering is an unsupervised learning task where we group similar data points together without any labels. K‑means is the most popular clustering algorithm.

K‑means partitions data into k clusters, each represented by its centroid (average point).

How K‑means Works

1. Choose k (number of clusters).
2. Randomly initialize k centroids.
3. Assign each point to the nearest centroid.
4. Update centroids as mean of assigned points.
5. Repeat steps 3‑4 until convergence (centroids stop moving).

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

Choosing k – The Elbow Method

Compute inertia (sum of squared distances to nearest centroid) for different k. Plot inertia vs k – look for an "elbow" where inertia stops decreasing rapidly.

inertias = []
for k in range(1,11):
    kmeans = KMeans(n_clusters=k)
    kmeans.fit(X)
    inertias.append(kmeans.inertia_)
plt.plot(range(1,11), inertias, marker='o')

When to Use K‑means

Works well when clusters are spherical and well‑separated. Not good for irregular shapes or different densities. Scale features before using K‑means.

Two Minute Drill

K‑means groups data into k clusters.
Use elbow method to choose k.
Scale features before clustering.
Works best for spherical clusters.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Clustering with K-Means

Need more clarification?