Created 07-19-2016 03:03 PM
I wrote Vector's public void writePointsToFile(Path path, FileSystem fs, Configuration conf, List<Vector> points) throws IOException { SequenceFile.Writer writer = SequenceFile.createWriter(conf, Writer.file(path), Writer.keyClass(LongWritable.class), Writer.valueClass(Vector.class)); long recNum = 0; for (Vector point : points) { writer.append(new LongWritable(recNum++), point); } writer.close(); } ( not sure that I used the right way to do that can't test it yet ) now I need to read this file as |
Created 07-19-2016 06:37 PM
Try storing the RDD to disk using saveAsObjectFile and you can retrieve back the same using objectFile()
vectorListRdd.saveAsObjectFile("<path>") val fileRdd = sc.objectFile[Vector]("<path>")
Created 07-19-2016 06:37 PM
Try storing the RDD to disk using saveAsObjectFile and you can retrieve back the same using objectFile()
vectorListRdd.saveAsObjectFile("<path>") val fileRdd = sc.objectFile[Vector]("<path>")
Created 07-20-2016 07:27 AM
@Arun A K thank you for your answer , I have Vector not List of RDD
Second I am using Java
Created 07-20-2016 01:12 PM
I thought you had an option to write the file to HDFS using a spark application. Hence assumed you had a list of Vector with which you can make an rdd calling the parallelize.
val sc = new SparkContext("local[1]","Sample App") val v1: Vector = Vectors.dense(2.0,3.0,4.0) val v2: Vector = Vectors.dense(5.0, 6.0, 7.0) val list = new ListBuffer[Vector]() list += v1 list += v2 val listRdd = sc.parallelize(list) listRdd.saveAsObjectFile("localFile") // read it back to an RDD as vector in another application val fileRdd = sc.objectFile[Vector]("localFile")