Support Questions

davidrebe · ‎05-16-2016

Hello,

i am studing Hortonworks Data Platform and i needs to run the K-Means algorithm. I know that the K-Means algorithm are in Mahout but the problem is that i don't know how i can execute this algorithm. I don't know how i can introduce any information to HDP Sandbox.

Can someone help me?

Thank you very much.

David.

bleonhardi · ‎05-16-2016

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

View solution in original post

bleonhardi · ‎05-16-2016

Hello David ,

Does it have to be Mahout? In general spark mlib is just quote a bit "cooler" now. Here is the Web page of it with an example code. ( If it has to be Mahout I am sure someone can help too)

http://spark.apache.org/docs/latest/mllib-clustering.html

Regarding Mahout I suppose you found that one already:

https://mahout.apache.org/users/clustering/k-means-clustering.html

davidrebe · ‎05-16-2016

Hello Benjamin,

the problem is that i need thw two implementations (Spark and MapReduce) for to make a comparision. To realise the task in Spark i don't know how i can introduce the data to Sandbox to execute the algorithm. In the link: http://spark.apache.org/docs/latest/mllib-clustering.html

only appears the code, but how i can create one task of Spark in Sandbox?

Can you help me?

bleonhardi · ‎05-16-2016

Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile.

You should go through a couple of basic tutorials I think to work with hadoop:

http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/

Cloudera Community

Support Questions

Apache Mahout K-Means Algorithm in HDP 2.4 on Hortonworks Sandbox with MapReduce!

Spark Clustering K-mean

Using Toad for Hadoop with HDP 2.4

Installing HAWQ on 2.4.0.0 Hortonworks Sandbox

Hortonworks sandbox 2.4 root password does not wor...

Installing RStudio on HDP Sandbox

Getting started with Hortonworks Sandbox on Azure

HDP-AWS (Hortonworks-AWS)

Alluxio on HDP 2.4 - In Memory HDFS

HDP Sandbox 2.4 connection refused

Create Kafka Topic and Use From Apache NiFi for HD...