question Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce! in Archives of Support Questions (Read Only)

Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

davidrebe — Fri, 16 Sep 2022 10:19:50 GMT

Hello,

i am studing Hortonworks Data Platform and i needs to run the K-Means algorithm. I know that the K-Means algorithm are in Mahout but the problem is that i don't know how i can execute this algorithm. I don't know how i can introduce any information to HDP Sandbox.

Can someone help me?

Thank you very much.

David.

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

bleonhardi — Mon, 16 May 2016 19:17:57 GMT

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

davidrebe — Mon, 16 May 2016 21:01:55 GMT

Hello Benjamin,

the problem is that i need thw two implementations (Spark and MapReduce) for to make a comparision. To realise the task in Spark i don't know how i can introduce the data to Sandbox to execute the algorithm. In the link: http://spark.apache.org/docs/latest/mllib-clustering.html

only appears the code, but how i can create one task of Spark in Sandbox?

Can you help me?

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

bleonhardi — Mon, 16 May 2016 21:38:01 GMT

Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile.

You should go through a couple of basic tutorials I think to work with hadoop:

http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/