Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

Solved Go to solution

Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

New Contributor

Hello,

i am studing Hortonworks Data Platform and i needs to run the K-Means algorithm. I know that the K-Means algorithm are in Mahout but the problem is that i don't know how i can execute this algorithm. I don't know how i can introduce any information to HDP Sandbox.

Can someone help me?

Thank you very much.

David.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

View solution in original post

3 REPLIES 3
Highlighted

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

View solution in original post

Highlighted

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

New Contributor

Hello Benjamin,

the problem is that i need thw two implementations (Spark and MapReduce) for to make a comparision. To realise the task in Spark i don't know how i can introduce the data to Sandbox to execute the algorithm. In the link: http://spark.apache.org/docs/latest/mllib-clustering.html

only appears the code, but how i can create one task of Spark in Sandbox?

Can you help me?

Highlighted

Re: Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile.

You should go through a couple of basic tutorials I think to work with hadoop:

http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/

Don't have an account?
Coming from Hortonworks? Activate your account here