To Find optimal clustering number using silhouette metrics
It evaluate clustering resulting in every k number of KMean algorithm.
And show it as figure.
Lager value is better result.
..
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples
import numpy as np
import matplotlib.pyplot as plt
silhouette_vals = []
sk,ek = 2,20
for i in range(sk, ek):
kmeans_plus = KMeans(n_clusters=i, init='k-means++')
pred = kmeans_plus.fit_predict(cluster_df)
silhouette_vals.append(np.mean(silhouette_samples(cluster_df, pred, metric='euclidean')))
plt.plot(range(sk, ek), silhouette_vals, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette')
plt.show()
..
For example here, 20 k is best clustering result.