5/30/2022

find optimal clustering number using silhouette evaluation

 

To Find optimal clustering number using silhouette metrics

It evaluate clustering resulting in every k number of KMean algorithm.

And show it as figure.

Lager value is better result.

..

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples
import numpy as np
import matplotlib.pyplot as plt
silhouette_vals = []
sk,ek = 2,20
for i in range(sk, ek):
kmeans_plus = KMeans(n_clusters=i, init='k-means++')
pred = kmeans_plus.fit_predict(cluster_df)
silhouette_vals.append(np.mean(silhouette_samples(cluster_df, pred, metric='euclidean')))
plt.plot(range(sk, ek), silhouette_vals, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette')
plt.show()

..

For example here, 20 k is best clustering result.


Thank you.


No comments:

Post a Comment