Hierarchical Clustering
Hierarchical clustering builds a tree of clusters (dendrogram) without pre‑specifying the number of clusters. It is useful for exploring data structure and for small datasets.
Hierarchical clustering creates a hierarchy of clusters that can be cut at any level to obtain a desired number of clusters.
Agglomerative (Bottom‑Up) Approach
1. Start with each point as its own cluster.
2. Find the two closest clusters and merge them.
3. Repeat until only one cluster remains.
4. The sequence of merges is shown in a dendrogram.
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=3)
labels = cluster.fit_predict(X)Linkage Criteria
- Single linkage: distance between closest points of two clusters (tends to create chains).
- Complete linkage: distance between farthest points (tends to create compact clusters).
- Average linkage: average distance between all pairs.
- Ward linkage: minimizes variance within clusters (most popular).
Plotting a Dendrogram
Use `scipy` to visualize the hierarchical tree. The height of merges indicates dissimilarity.
from scipy.cluster.hierarchy import dendrogram, linkage
linked = linkage(X, method='ward')
dendrogram(linked)
plt.show()When to Use Hierarchical Clustering
Small datasets (few thousand points) where you want a dendrogram for insight. Not suitable for very large datasets due to O(n³) complexity.
Two Minute Drill
- Hierarchical clustering builds a dendrogram.
- Agglomerative clustering starts with single points and merges.
- Linkage criteria (Ward, complete, single) affect cluster shapes.
- Good for exploring structure in small datasets.
Need more clarification?
Drop us an email at career@quipoinfotech.com
