Created on 05-16-2016 12:04 PM - edited 09-16-2022 03:19 AM
Hello,
i am studing Hortonworks Data Platform and i needs to run the K-Means algorithm. I know that the K-Means algorithm are in Mahout but the problem is that i don't know how i can execute this algorithm. I don't know how i can introduce any information to HDP Sandbox.
Can someone help me?
Thank you very much.
David.
Created 05-16-2016 12:17 PM
Hello David ,
Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)
http://spark.apache.org/docs/latest/mllib-clustering.html
Regarding Mahout I suppose you found that one already:
https://mahout.apache.org/users/clustering/k-means-clustering.html
Created 05-16-2016 12:17 PM
Hello David ,
Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)
http://spark.apache.org/docs/latest/mllib-clustering.html
Regarding Mahout I suppose you found that one already:
https://mahout.apache.org/users/clustering/k-means-clustering.html
Created 05-16-2016 02:01 PM
Hello Benjamin,
the problem is that i need thw two implementations (Spark and MapReduce) for to make a comparision. To realise the task in Spark i don't know how i can introduce the data to Sandbox to execute the algorithm. In the link: http://spark.apache.org/docs/latest/mllib-clustering.html
only appears the code, but how i can create one task of Spark in Sandbox?
Can you help me?
Created 05-16-2016 02:38 PM
Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile.
You should go through a couple of basic tutorials I think to work with hadoop:
http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/