NLP/K-MEANS/PYTHON
Hi all,
Im currently doing a short-text clustering task of NLP. Im trying to cluster the short text by K-means.
I have completed embedding the sentences(by using GLOVE) and feed to CNN, and then I used K-means to do clustering.
I find most(or maybe all) online tutorials only show the way to plot the clustering results...none of them tell how to print out the sentences/documents in the clusters. I have figured out the way to print out sentences in each clusters(Im using Python)
My question is :
- how to print out the sentence/document of the center point?
- how can I print out sentences and order them by their distance to the center point of the the cluster?
Can anyone help me on this issue?
Many thanks in advance!
My code:
#print centers of the clusters centers = kmeans.cluster_centers_ centroidpoint = pca.transform(centers) print("Centers- Kmeans") print(centers) out put is like this:
Centers- Kmeans [[0.0752584 0.08675878 0.03207847 ... 0.10317419 0.07130289 0.0322413 ] [0.06198343 0.07327988 0.05582789 ... 0.10588244 0.0630549 0.03647455] ... how can I find out the sentences of the center point of the cluster instead of just output the vector value of the center of the cluster?
#print out the sentences in each cluster centroid_list = kmeans.cluster_centers_ labels = kmeans.labels_ n_clusters_ = len(centroid_list) # print "cluster centroids:",centroid_list print (labels) cluster_menmbers_list = [] for i in range(0, n_clusters_): menmbers_list = [] for j in range(0, len(labels)): if labels[j] == i: menmbers_list.append(j) cluster_menmbers_list.append(menmbers_list) # print cluster_menmbers_list for i in range(0,len(cluster_menmbers_list)): print("CLUSTER" + " " + str(i) + ':') for j in range(0,len(cluster_menmbers_list[i])): a = cluster_menmbers_list[i][j] print(data1[a]) the out put is like:
cluster 0: sentence1 sentence2 sentence3 ... cluster 1: sentence1 sentence2 sentence3 but these sentences are not orderred by their distance to the center of the cluster, so they look very dispersed...
how can I print out like the top 20 or top 30 of the sentences that are nearreat to the center of each cluster?
Many thnaks in advance!
https://stackoverflow.com/questions/66477417/how-to-print-out-the-short-text-by-their-distance-to-center-point-in-each-cluste March 04, 2021 at 10:48PM
没有评论:
发表评论