12-04-2016 08:19 AM
How to estimate the most important features in each cluster after the application of the k-means clustering algorithm? I need to cluster the customers of retail shops based on the products that they purchased. Therefore, I need to obtain, as results, both the customers belonging to each cluster and in each cluster the products that mostly influence the specified cluster (i.e. in the cluster A, among all products, the customers purchase meat, bread, milk, ecc...). I'm going to use Apache Spark Mllib.