Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

avatar
Explorer

Hello,

i am studing Hortonworks Data Platform and i needs to run the K-Means algorithm. I know that the K-Means algorithm are in Mahout but the problem is that i don't know how i can execute this algorithm. I don't know how i can introduce any information to HDP Sandbox.

Can someone help me?

Thank you very much.

David.

1 ACCEPTED SOLUTION

avatar
Master Guru

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

View solution in original post

3 REPLIES 3

avatar
Master Guru

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

avatar
Explorer

Hello Benjamin,

the problem is that i need thw two implementations (Spark and MapReduce) for to make a comparision. To realise the task in Spark i don't know how i can introduce the data to Sandbox to execute the algorithm. In the link: http://spark.apache.org/docs/latest/mllib-clustering.html

only appears the code, but how i can create one task of Spark in Sandbox?

Can you help me?

avatar
Master Guru

Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile.

You should go through a couple of basic tutorials I think to work with hadoop:

http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/