About emad_m_refai

emad_m_refai · ‎07-20-2016

@Benjamin Leonhardi This is what I think that I can do KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations); JavaRDD<Integer> clusterPoints = clusters.predict(parsedData); List<Integer> list = clusterPoints.toArray();

emad_m_refai · ‎07-20-2016

@Benjamin Leonhardi thank you for your answer , Can you tell me please how to extract the Cluster information as List<Integer> where this list contain coordinates for Clustered Data ?

emad_m_refai · ‎07-20-2016

Hello, Can you please explain to me what kind of data I got when I use Spark Clustering from Mllib like the following KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);

emad_m_refai · ‎07-20-2016

@Marco Gaido thank you for you answer it's really helpful, Can you please tell me how to store the vectors to the HDFS after converting them and then read them from the HDFS to use them in Spark kmean for clustering as KMeansModel clusters = KMeans.train

emad_m_refai · ‎07-20-2016

@Arun A K thank you for your answer , I have Vector not List of RDD Second I am using Java

emad_m_refai · ‎07-19-2016

I wrote Vector's (org.apache.spark.mllib.linalg.Vector) to the HDFS as the following public void writePointsToFile(Path path, FileSystem fs, Configuration conf, List<Vector> points) throws IOException { SequenceFile.Writer writer = SequenceFile.createWriter(conf, Writer.file(path), Writer.keyClass(LongWritable.class), Writer.valueClass(Vector.class)); long recNum = 0; for (Vector point : points) { writer.append(new LongWritable(recNum++), point); } writer.close(); } ( not sure that I used the right way to do that can't test it yet ) now I need to read this file as JavaRDD<Vector> because I want to use it in Spark Clustering K-mean but don't know how to do this.

emad_m_refai · ‎07-19-2016

I have a VectorWritable (org.apache.mahout.math.VectorWritable) which is coming from a sequence file generated by Mahout something like the following. publicvoid write(List<Vector> points,int clustersNumber,HdfsConnector connector)throwsIOException{this.writePointsToFile(newPath(connector.getPointsInput(),"pointsInput"), connector.getFs(), connector.getConf(), points);Path clusterCentroids =newPath(connector.getClustersInput(),"part-0");SequenceFile.Writer writer =SequenceFile.createWriter( connector.getConf(),Writer.file(clusterCentroids),Writer.keyClass(Text.class),Writer.valueClass(Kluster.class));List<Vector> centroids = getCentroids;for(int i =0; i < centroids.size(); i++){Vector vect = centroids.get(i);Kluster centroidCluster =newKluster(vect, i,newSquaredEuclideanDistanceMeasure()); writer.append(newText((centroidCluster).getIdentifier()), centroidCluster);} writer.close();} and I would like to convert that into Vector (org.apache.spark.mllib.linalg.Vectors) type Spark as JavaRDD<Vector> How can I do that in Java ? I've read something about sequenceFile in Spark but I couldn't figure out how to do it.

emad_m_refai · ‎06-01-2016

In sandbox 2.4 the default username and password marie_dev have a read permission for that you need to reset the username and password for admin , you can do that by lancing the script ambari-admin-password-reset after that you can login to Ambari by the username and password you just entered ,and there you have you admin permission 🙂

Online	Offline
Last Visited	‎08-16-2016 02:23 PM

Member Since	‎05-30-2016 08:41 AM
Last Visited	‎08-16-2016 02:23 PM
Posts	25
Kudos received	5

Cloudera Community

Re: Ambari 2.2.1 restart services

Re: Spark Clustering K-mean

Re: Spark Clustering K-mean

Spark Clustering K-mean

Re: In Java Convert Mahout Vector to Spark Vector

Re: Java Read and Write Spark Vector's to Hdfs

Java Read and Write Spark Vector's to Hdfs

In Java Convert Mahout Vector to Spark Vector

Re: Ambari 2.2.1 restart services