Member since
05-30-2016
25
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4899 | 06-01-2016 03:07 PM |
07-20-2016
02:49 PM
I think you should look at Spark RDD programming introduction. What you get is an RDD of integers. You can then use Spark functions like map/foreach etc. to do stuff with it. So the question is what you actually want to do. Why do you want a List is my question. You can do rdd.collect to get it all in a big Array on your driver but that is most likely not what you actually want to do. http://spark.apache.org/docs/latest/programming-guide.html I.e. clusterPoints.collect() will give you an array of points in your local driver. However it downloads all results to your local driver and doesn't work in parallel anymore. If that works with your data volumes great. But normally you should use the functions like map etc. of spark to make computations in parallel. Below is a scoring example that runs a scoring point by point so you could do other things in this function as well. Whatever you want to do with the information essentially. http://blog.sequenceiq.com/blog/2014/07/31/spark-mllib/ <code>val clusters: KMeansModel = KMeans.train(data, K, maxIteration, runs)
val vectorsAndClusterIdx = data.map{ point =>
val prediction = clusters.predict(point)
(point.toString, prediction)
}
... View more
07-20-2016
01:12 PM
I thought you had an option to write the file to HDFS using a spark application. Hence assumed you had a list of Vector with which you can make an rdd calling the parallelize. val sc = new SparkContext("local[1]","Sample App")
val v1: Vector = Vectors.dense(2.0,3.0,4.0)
val v2: Vector = Vectors.dense(5.0, 6.0, 7.0)
val list = new ListBuffer[Vector]()
list += v1
list += v2
val listRdd = sc.parallelize(list)
listRdd.saveAsObjectFile("localFile")
// read it back to an RDD as vector in another application
val fileRdd = sc.objectFile[Vector]("localFile")
These methods are available in JavaSparkContext and JavaRDD
... View more
07-20-2016
01:44 PM
The easiest way is to use the method saveAsObjectFile and read it through the objectFile method... You can easily find them in Spark documentation for further details about them.
... View more