Support Questions

Find answers, ask questions, and share your expertise

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Cloudera Community
- :
- Support
- :
- Support Questions
- :
- Re: In Java Convert Mahout Vector to Spark Vector

Announcements

Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Labels:

Contributor

Created 07-19-2016 11:56 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have a `VectorWritable`

`(org.apache.mahout.math.VectorWritable)`

which is coming from a sequence file generated by `Mahout`

something like the following.

```
publicvoid write(List<Vector> points,int clustersNumber,HdfsConnector connector)throwsIOException{this.writePointsToFile(newPath(connector.getPointsInput(),"pointsInput"), connector.getFs(), connector.getConf(), points);Path clusterCentroids =newPath(connector.getClustersInput(),"part-0");SequenceFile.Writer writer =SequenceFile.createWriter(
connector.getConf(),Writer.file(clusterCentroids),Writer.keyClass(Text.class),Writer.valueClass(Kluster.class));List<Vector> centroids = getCentroids;for(int i =0; i < centroids.size(); i++){Vector vect = centroids.get(i);Kluster centroidCluster =newKluster(vect, i,newSquaredEuclideanDistanceMeasure());
writer.append(newText((centroidCluster).getIdentifier()),
centroidCluster);}
writer.close();}
```

and I would like to convert that into `Vector`

`(org.apache.spark.mllib.linalg.Vectors)`

type Spark as `JavaRDD<Vector>`

How can I do that in Java ?

I've read something about `sequenceFile`

in Spark but I couldn't figure out how to do it.

1 ACCEPTED SOLUTION

Accepted Solutions

Rising Star

Created 07-20-2016 08:22 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Tou can convert a org.apache.mahout.math.Vector into a org.apache.spark.mllib.linalg.Vector by using the iterateNonZero() or iterateAll() methods of org.apache.mahout.math.Vector.

In fact, if you Vector is sparse the first option is the best. In this case you can build two arrays via the iterateNonZero: one containing all the non-zero indexes and the other with the corresponding values, i.e.

ArrayList<Double> values = new ArrayList<Double>(); ArrayList<Integer> indexes = new ArrayList<Integer>(); org.apache.mahout.math.Vector v = ... Iterator<Element> it = v.iterateNonZero(); while(it.hasNext()){ Element e = it.next(); values.add(e.get()); indexes.add(e.index()); } Vectors.sparse(v.size(), indexes.toArray(new Integer[indexes.size()]) ,values.toArray(new Double[values.size()]));

You can do the same thing if you have a dense Vector using the iterateAll() method and Vectors.dense.

3 REPLIES 3

Rising Star

Created 07-20-2016 08:22 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Tou can convert a org.apache.mahout.math.Vector into a org.apache.spark.mllib.linalg.Vector by using the iterateNonZero() or iterateAll() methods of org.apache.mahout.math.Vector.

In fact, if you Vector is sparse the first option is the best. In this case you can build two arrays via the iterateNonZero: one containing all the non-zero indexes and the other with the corresponding values, i.e.

ArrayList<Double> values = new ArrayList<Double>(); ArrayList<Integer> indexes = new ArrayList<Integer>(); org.apache.mahout.math.Vector v = ... Iterator<Element> it = v.iterateNonZero(); while(it.hasNext()){ Element e = it.next(); values.add(e.get()); indexes.add(e.index()); } Vectors.sparse(v.size(), indexes.toArray(new Integer[indexes.size()]) ,values.toArray(new Double[values.size()]));

You can do the same thing if you have a dense Vector using the iterateAll() method and Vectors.dense.

Re: In Java Convert Mahout Vector to Spark Vector

Contributor

Created 07-20-2016 09:25 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

@Marco Gaido thank you for you answer it's really helpful,

Can you please tell me how to store the vectors to the HDFS after converting them

and then read them from the HDFS to use them in Spark kmean for clustering

as *KMeansModel clusters = KMeans.train*

Highlighted
##

The easiest way is to use the method saveAsObjectFile and read it through the objectFile method... You can easily find them in Spark documentation for further details about them.

Re: In Java Convert Mahout Vector to Spark Vector

Rising Star

Created 07-20-2016 01:44 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Coming from Hortonworks? Activate your account here