<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question In Java Convert Mahout Vector to Spark Vector in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135310#M35094</link>
    <description>&lt;P&gt;I have a &lt;CODE&gt;VectorWritable&lt;/CODE&gt; &lt;CODE&gt;(org.apache.mahout.math.VectorWritable)&lt;/CODE&gt; which is coming from a sequence file generated by &lt;CODE&gt;Mahout&lt;/CODE&gt; something like the following.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;publicvoid write(List&amp;lt;Vector&amp;gt; points,int clustersNumber,HdfsConnector connector)throwsIOException{this.writePointsToFile(newPath(connector.getPointsInput(),"pointsInput"), connector.getFs(), connector.getConf(), points);Path clusterCentroids =newPath(connector.getClustersInput(),"part-0");SequenceFile.Writer writer =SequenceFile.createWriter(
            connector.getConf(),Writer.file(clusterCentroids),Writer.keyClass(Text.class),Writer.valueClass(Kluster.class));List&amp;lt;Vector&amp;gt; centroids = getCentroids;for(int i =0; i &amp;lt; centroids.size(); i++){Vector vect = centroids.get(i);Kluster centroidCluster =newKluster(vect, i,newSquaredEuclideanDistanceMeasure());
        writer.append(newText((centroidCluster).getIdentifier()),
                centroidCluster);}
    writer.close();}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;and I would like to convert that into &lt;CODE&gt;Vector&lt;/CODE&gt; &lt;CODE&gt;(org.apache.spark.mllib.linalg.Vectors)&lt;/CODE&gt; type Spark as &lt;CODE&gt;JavaRDD&amp;lt;Vector&amp;gt;&lt;/CODE&gt; How can I do that in Java ?&lt;/P&gt;&lt;P&gt;I've read something about &lt;CODE&gt;sequenceFile&lt;/CODE&gt; in Spark but I couldn't figure out how to do it.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:30:34 GMT</pubDate>
    <dc:creator>emad_m_refai</dc:creator>
    <dc:date>2022-09-16T10:30:34Z</dc:date>
    <item>
      <title>In Java Convert Mahout Vector to Spark Vector</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135310#M35094</link>
      <description>&lt;P&gt;I have a &lt;CODE&gt;VectorWritable&lt;/CODE&gt; &lt;CODE&gt;(org.apache.mahout.math.VectorWritable)&lt;/CODE&gt; which is coming from a sequence file generated by &lt;CODE&gt;Mahout&lt;/CODE&gt; something like the following.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;publicvoid write(List&amp;lt;Vector&amp;gt; points,int clustersNumber,HdfsConnector connector)throwsIOException{this.writePointsToFile(newPath(connector.getPointsInput(),"pointsInput"), connector.getFs(), connector.getConf(), points);Path clusterCentroids =newPath(connector.getClustersInput(),"part-0");SequenceFile.Writer writer =SequenceFile.createWriter(
            connector.getConf(),Writer.file(clusterCentroids),Writer.keyClass(Text.class),Writer.valueClass(Kluster.class));List&amp;lt;Vector&amp;gt; centroids = getCentroids;for(int i =0; i &amp;lt; centroids.size(); i++){Vector vect = centroids.get(i);Kluster centroidCluster =newKluster(vect, i,newSquaredEuclideanDistanceMeasure());
        writer.append(newText((centroidCluster).getIdentifier()),
                centroidCluster);}
    writer.close();}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;and I would like to convert that into &lt;CODE&gt;Vector&lt;/CODE&gt; &lt;CODE&gt;(org.apache.spark.mllib.linalg.Vectors)&lt;/CODE&gt; type Spark as &lt;CODE&gt;JavaRDD&amp;lt;Vector&amp;gt;&lt;/CODE&gt; How can I do that in Java ?&lt;/P&gt;&lt;P&gt;I've read something about &lt;CODE&gt;sequenceFile&lt;/CODE&gt; in Spark but I couldn't figure out how to do it.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:30:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135310#M35094</guid>
      <dc:creator>emad_m_refai</dc:creator>
      <dc:date>2022-09-16T10:30:34Z</dc:date>
    </item>
    <item>
      <title>Re: In Java Convert Mahout Vector to Spark Vector</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135311#M35095</link>
      <description>&lt;P&gt;Tou can convert a org.apache.mahout.math.Vector into a org.apache.spark.mllib.linalg.Vector by using the iterateNonZero() or iterateAll() methods of org.apache.mahout.math.Vector.&lt;/P&gt;&lt;P&gt;In fact, if you Vector is sparse the first option is the best. In this case you can build two arrays via the iterateNonZero: one containing all the non-zero indexes and the other with the corresponding values, i.e.&lt;/P&gt;&lt;PRE&gt;ArrayList&amp;lt;Double&amp;gt; values = new ArrayList&amp;lt;Double&amp;gt;();
ArrayList&amp;lt;Integer&amp;gt; indexes = new ArrayList&amp;lt;Integer&amp;gt;();
org.apache.mahout.math.Vector v = ...
Iterator&amp;lt;Element&amp;gt; it = v.iterateNonZero();
while(it.hasNext()){
	Element e = it.next();
	values.add(e.get());
	indexes.add(e.index());
}
Vectors.sparse(v.size(), indexes.toArray(new Integer[indexes.size()]) ,values.toArray(new Double[values.size()]));&lt;/PRE&gt;&lt;P&gt;You can do the same thing if you have a dense Vector using the iterateAll() method and Vectors.dense.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 15:22:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135311#M35095</guid>
      <dc:creator>mgaido</dc:creator>
      <dc:date>2016-07-20T15:22:17Z</dc:date>
    </item>
    <item>
      <title>Re: In Java Convert Mahout Vector to Spark Vector</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135312#M35096</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3670/gaido.html" nodeid="3670"&gt;@Marco Gaido&lt;/A&gt; thank you for you answer it's really helpful,&lt;/P&gt;&lt;P&gt;Can you please tell me how to store the vectors  to the HDFS after converting them&lt;/P&gt;&lt;P&gt; and then read them from the HDFS to use them in Spark kmean for clustering &lt;/P&gt;&lt;P&gt;as &lt;STRONG&gt;&lt;EM&gt;KMeansModel clusters = KMeans.train&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 16:25:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135312#M35096</guid>
      <dc:creator>emad_m_refai</dc:creator>
      <dc:date>2016-07-20T16:25:04Z</dc:date>
    </item>
    <item>
      <title>Re: In Java Convert Mahout Vector to Spark Vector</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135313#M35097</link>
      <description>&lt;P&gt;The easiest way is to use the method saveAsObjectFile and read it through the objectFile method... You can easily find them in Spark documentation for further details about them.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 20:44:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/In-Java-Convert-Mahout-Vector-to-Spark-Vector/m-p/135313#M35097</guid>
      <dc:creator>mgaido</dc:creator>
      <dc:date>2016-07-20T20:44:16Z</dc:date>
    </item>
  </channel>
</rss>

