About emad_m_refai

bleonhardi · ‎07-20-2016

I think you should look at Spark RDD programming introduction. What you get is an RDD of integers. You can then use Spark functions like map/foreach etc. to do stuff with it. So the question is what you actually want to do. Why do you want a List is my question. You can do rdd.collect to get it all in a big Array on your driver but that is most likely not what you actually want to do. http://spark.apache.org/docs/latest/programming-guide.html I.e. clusterPoints.collect() will give you an array of points in your local driver. However it downloads all results to your local driver and doesn't work in parallel anymore. If that works with your data volumes great. But normally you should use the functions like map etc. of spark to make computations in parallel. Below is a scoring example that runs a scoring point by point so you could do other things in this function as well. Whatever you want to do with the information essentially. http://blog.sequenceiq.com/blog/2014/07/31/spark-mllib/ <code>val clusters: KMeansModel = KMeans.train(data, K, maxIteration, runs) val vectorsAndClusterIdx = data.map{ point => val prediction = clusters.predict(point) (point.toString, prediction) }

arunak · ‎07-20-2016

I thought you had an option to write the file to HDFS using a spark application. Hence assumed you had a list of Vector with which you can make an rdd calling the parallelize. val sc = new SparkContext("local[1]","Sample App") val v1: Vector = Vectors.dense(2.0,3.0,4.0) val v2: Vector = Vectors.dense(5.0, 6.0, 7.0) val list = new ListBuffer[Vector]() list += v1 list += v2 val listRdd = sc.parallelize(list) listRdd.saveAsObjectFile("localFile") // read it back to an RDD as vector in another application val fileRdd = sc.objectFile[Vector]("localFile") These methods are available in JavaSparkContext and JavaRDD

mgaido · ‎07-20-2016

The easiest way is to use the method saveAsObjectFile and read it through the objectFile method... You can easily find them in Spark documentation for further details about them.

Online	Offline
Last Visited	‎08-16-2016 02:23 PM

Member Since	‎05-30-2016 08:41 AM
Last Visited	‎08-16-2016 02:23 PM
Posts	25
Kudos received	5

Cloudera Community

Re: Ambari 2.2.1 restart services

Re: Spark Clustering K-mean

Re: Java Read and Write Spark Vector's to Hdfs

Re: In Java Convert Mahout Vector to Spark Vector