w3resource

Generate random data and perform clustering using SciPy's Hierarchical clustering


NumPy: Integration with SciPy Exercise-11 with Solution


Write a NumPy program to generate random data and perform clustering using SciPy's hierarchical clustering methods.

Sample Solution:

Python Code:

import numpy as np  # Import NumPy library
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster  # Import hierarchical clustering functions
import matplotlib.pyplot as plt  # Import matplotlib for plotting

# Generate random data
np.random.seed(0)  # Seed for reproducibility
data = np.random.randn(50, 2)  # Generate 50 2D points

# Perform hierarchical clustering using the linkage method
Z = linkage(data, method='ward')

# Create dendrogram plot
plt.figure(figsize=(10, 7))
dendrogram(Z)
plt.title("Dendrogram for Hierarchical Clustering")
plt.xlabel("Index")
plt.ylabel("Distance")
plt.show()

# Form flat clusters with a distance threshold
max_d = 1.5  # Maximum distance for clusters
clusters = fcluster(Z, max_d, criterion='distance')

# Print the cluster assignments
print("Cluster Assignments for Each Point:")
print(clusters)

# Plot the data points with cluster assignments
plt.figure(figsize=(10, 7))
plt.scatter(data[:, 0], data[:, 1], c=clusters, cmap='prism')
plt.title("Data Points with Cluster Assignments")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Output:

Cluster Assignments for Each Point:
[ 7 10  6  7 12 11  7  7  7  8  2  9  6  8 10 12  5 12 10  4  5  3  4  1
  1  1  5 12 12  4  4  5  8  1  1  7  9  8  4 12  1  9 10  7  1 11 12 12
 11 12]
Generate random data and perform clustering using SciPy's Hierarchical clustering

Explanation:

  • Import Libraries:
    • Import NumPy for generating random data.
    • Import hierarchical clustering functions from SciPy: dendrogram, linkage, and fcluster.
    • Import Matplotlib for plotting.
  • Generate random data:
    • Use np.random.seed(0) for reproducibility.
    • Generate 50 random 2D points using np.random.randn(50, 2).
  • Perform hierarchical clustering:
    • Use the linkage method from SciPy with the 'ward' method to perform hierarchical clustering on the generated data.
  • Create Dendrogram Plot:
    • Use Matplotlib to create a dendrogram plot of the hierarchical clustering results. Label the x-axis as "Index" and the y-axis as "Distance".
  • Form Flat Clusters:
    • Use the fcluster function to form flat clusters based on a maximum distance threshold (max_d = 1.5).
  • Print Cluster Assignments:
    • Output the cluster assignments for each data point.
  • Plot Data Points with Cluster Assignments:
    • Use Matplotlib to create a scatter plot of the data points, coloring them based on their cluster assignments. Label the x-axis as "Feature 1" and the y-axis as "Feature 2".

Python-Numpy Code Editor: