<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Java Read and Write Spark Vector's to Hdfs in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135833#M98484</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10529/akeezhadath.html" nodeid="10529"&gt;@Arun A K&lt;/A&gt; thank you for your answer , I have Vector not List of RDD&lt;/P&gt;&lt;P&gt;Second I am using Java&lt;/P&gt;</description>
    <pubDate>Wed, 20 Jul 2016 14:27:23 GMT</pubDate>
    <dc:creator>emad_m_refai</dc:creator>
    <dc:date>2016-07-20T14:27:23Z</dc:date>
    <item>
      <title>Java Read and Write Spark Vector's to Hdfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135831#M98482</link>
      <description>&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;
&lt;/TD&gt;
&lt;TD&gt;&lt;P&gt;I wrote Vector's &lt;CODE&gt;(org.apache.spark.mllib.linalg.Vector)&lt;/CODE&gt; to the &lt;CODE&gt;HDFS&lt;/CODE&gt; as the following&lt;/P&gt;&lt;PRE&gt;public void writePointsToFile(Path path, FileSystem fs, Configuration conf,
        List&amp;lt;Vector&amp;gt; points) throws IOException {

    SequenceFile.Writer writer = SequenceFile.createWriter(conf,
            Writer.file(path), Writer.keyClass(LongWritable.class),
            Writer.valueClass(Vector.class));

    long recNum = 0;

    for (Vector point : points) {
        writer.append(new LongWritable(recNum++), point);
    }
    writer.close();
}
&lt;/PRE&gt;&lt;P&gt;( not sure that I used the right way to do that can't test it yet )&lt;/P&gt;&lt;P&gt;now I need to read this file as &lt;CODE&gt;JavaRDD&amp;lt;Vector&amp;gt;&lt;/CODE&gt; because I want to use it in Spark &lt;CODE&gt;Clustering K-mean&lt;/CODE&gt; but don't know how to do this.&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Tue, 19 Jul 2016 22:03:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135831#M98482</guid>
      <dc:creator>emad_m_refai</dc:creator>
      <dc:date>2016-07-19T22:03:24Z</dc:date>
    </item>
    <item>
      <title>Re: Java Read and Write Spark Vector's to Hdfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135832#M98483</link>
      <description>&lt;P&gt;Try storing the RDD to disk using saveAsObjectFile and you can retrieve back the same using objectFile()&lt;/P&gt;&lt;PRE&gt;vectorListRdd.saveAsObjectFile("&amp;lt;path&amp;gt;")
val fileRdd = sc.objectFile[Vector]("&amp;lt;path&amp;gt;")&lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Jul 2016 01:37:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135832#M98483</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-07-20T01:37:38Z</dc:date>
    </item>
    <item>
      <title>Re: Java Read and Write Spark Vector's to Hdfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135833#M98484</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10529/akeezhadath.html" nodeid="10529"&gt;@Arun A K&lt;/A&gt; thank you for your answer , I have Vector not List of RDD&lt;/P&gt;&lt;P&gt;Second I am using Java&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 14:27:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135833#M98484</guid>
      <dc:creator>emad_m_refai</dc:creator>
      <dc:date>2016-07-20T14:27:23Z</dc:date>
    </item>
    <item>
      <title>Re: Java Read and Write Spark Vector's to Hdfs</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135834#M98485</link>
      <description>&lt;P&gt;I thought you had an option to write the file to HDFS using a spark application. Hence assumed you had a list of Vector with which you can make an rdd calling the parallelize.&lt;/P&gt;&lt;PRE&gt;val sc = new SparkContext("local[1]","Sample App")
  val v1: Vector = Vectors.dense(2.0,3.0,4.0)
  val v2: Vector = Vectors.dense(5.0, 6.0, 7.0)
  val list = new ListBuffer[Vector]()
  list += v1
  list += v2
 val listRdd =  sc.parallelize(list)
 listRdd.saveAsObjectFile("localFile")
 // read it back to an RDD as vector in another application
val fileRdd = sc.objectFile[Vector]("localFile")
 &lt;/PRE&gt;&lt;P&gt;These methods are available in JavaSparkContext and JavaRDD&lt;A href="https://spark.apache.org/docs/1.4.0/api/java/index.html?org/apache/spark/api/java/JavaSparkContext.html"&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 20:12:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Java-Read-and-Write-Spark-Vector-s-to-Hdfs/m-p/135834#M98485</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-07-20T20:12:24Z</dc:date>
    </item>
  </channel>
</rss>

