Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

avatar
Frequent Visitor

Hello,

i am studing Hortonworks Data Platform and i needs to run the K-Means algorithm. I know that the K-Means algorithm are in Mahout but the problem is that i don't know how i can execute this algorithm. I don't know how i can introduce any information to HDP Sandbox.

Can someone help me?

Thank you very much.

David.

1 ACCEPTED SOLUTION

avatar
Master Guru

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

View solution in original post

3 REPLIES 3

avatar
Master Guru

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

avatar
Frequent Visitor

Hello Benjamin,

the problem is that i need thw two implementations (Spark and MapReduce) for to make a comparision. To realise the task in Spark i don't know how i can introduce the data to Sandbox to execute the algorithm. In the link: http://spark.apache.org/docs/latest/mllib-clustering.html

only appears the code, but how i can create one task of Spark in Sandbox?

Can you help me?

avatar
Master Guru

Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile.

You should go through a couple of basic tutorials I think to work with hadoop:

http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/