Member since
05-30-2016
25
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1258 | 06-01-2016 03:07 PM |
07-20-2016
02:42 PM
@Benjamin Leonhardi
This is what I think that I can do KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
JavaRDD<Integer> clusterPoints = clusters.predict(parsedData);
List<Integer> list = clusterPoints.toArray();
... View more
07-20-2016
02:31 PM
@Benjamin Leonhardi thank you for your answer , Can you tell me please how to extract the Cluster information as List<Integer> where this list contain coordinates for Clustered Data ?
... View more
07-20-2016
12:44 PM
Hello, Can you please explain to me what kind of data I got when I use Spark Clustering from Mllib like the following KMeansModel clusters = KMeans.train(parsedData.rdd(), numClusters, numIterations);
... View more
Labels:
07-20-2016
09:25 AM
@Marco Gaido thank you for you answer it's really helpful, Can you please tell me how to store the vectors to the HDFS after converting them and then read them from the HDFS to use them in Spark kmean for clustering as KMeansModel clusters = KMeans.train
... View more
07-20-2016
07:27 AM
@Arun A K thank you for your answer , I have Vector not List of RDD Second I am using Java
... View more
07-19-2016
03:03 PM
I wrote Vector's (org.apache.spark.mllib.linalg.Vector) to the HDFS as the following public void writePointsToFile(Path path, FileSystem fs, Configuration conf,
List<Vector> points) throws IOException {
SequenceFile.Writer writer = SequenceFile.createWriter(conf,
Writer.file(path), Writer.keyClass(LongWritable.class),
Writer.valueClass(Vector.class));
long recNum = 0;
for (Vector point : points) {
writer.append(new LongWritable(recNum++), point);
}
writer.close();
}
( not sure that I used the right way to do that can't test it yet ) now I need to read this file as JavaRDD<Vector> because I want to use it in Spark Clustering K-mean but don't know how to do this.
... View more
Labels:
07-19-2016
11:56 AM
I have a VectorWritable (org.apache.mahout.math.VectorWritable) which is coming from a sequence file generated by Mahout something like the following. publicvoid write(List<Vector> points,int clustersNumber,HdfsConnector connector)throwsIOException{this.writePointsToFile(newPath(connector.getPointsInput(),"pointsInput"), connector.getFs(), connector.getConf(), points);Path clusterCentroids =newPath(connector.getClustersInput(),"part-0");SequenceFile.Writer writer =SequenceFile.createWriter(
connector.getConf(),Writer.file(clusterCentroids),Writer.keyClass(Text.class),Writer.valueClass(Kluster.class));List<Vector> centroids = getCentroids;for(int i =0; i < centroids.size(); i++){Vector vect = centroids.get(i);Kluster centroidCluster =newKluster(vect, i,newSquaredEuclideanDistanceMeasure());
writer.append(newText((centroidCluster).getIdentifier()),
centroidCluster);}
writer.close();} and I would like to convert that into Vector (org.apache.spark.mllib.linalg.Vectors) type Spark as JavaRDD<Vector> How can I do that in Java ? I've read something about sequenceFile in Spark but I couldn't figure out how to do it.
... View more
Labels:
06-01-2016
03:07 PM
1 Kudo
In sandbox 2.4 the default username and password marie_dev have a read permission for that you need to reset the username and password for admin , you can do that by lancing the script ambari-admin-password-reset after that you can login to Ambari by the username and password you just entered ,and there you have you admin permission 🙂
... View more
06-01-2016
07:53 AM
eclipse-files.zip @Sandeep Nemuri please find the pom.xml and java classes in the attatchment. and when i do netstat -at | grep 7077 in the virtual machine it return nothing.
... View more
05-31-2016
02:35 PM
1 Kudo
sparklog.zipI am running spark job as java application from eclipse on windows machine using HDP 2.2 on VirtualBox But I get the following error, Yarn application has already ended! It might have been killed or unable to launch application master. please see in the attatchment the complete log. I tried to see the job log by lancing this command yarn logs -applicationId <application ID> But i got this error 16/05/31 13:36:47 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/05/31 13:36:48 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.24.244.31:8050
/app-logs/root/logs/application_1464699667428_0001does not exist. Any Ideas ??
... View more
05-31-2016
02:18 PM
@Sandeep Nemuri does the following mean any thing to you? [2016-05-31 16:15:37,445][INFO] Application report for application_1464703415943_0002 (state: ACCEPTED)
[2016-05-31 16:15:37,453][DEBUG]
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1464704136551
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1464703415943_0002/
... View more
05-31-2016
01:38 PM
@Sandeep Nemuri when I try to check the logs by the command yarn logs -applicationId i get 16/05/31 13:36:47 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/05/31 13:36:48 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.24.244.31:8050
/app-logs/root/logs/application_1464699667428_0001does not exist.
Log aggregation has not completed or is not enabled.
... View more
05-31-2016
12:53 PM
@Sandeep Nemuri thank you for all your answers, Do you know what cause the following error ? org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
... View more
05-31-2016
11:58 AM
@Jitendra Yadav I can't get the log from the RM UI when I click on the log it give Access Denied. can i get those logs from the terminal?
... View more
05-31-2016
11:32 AM
@Jitendra Yadav I found it but it's huge yarn-yarn-nodemanager-sandboxhortonworkscomlogtxt.zip should i look for a specific thing in it? for example i can see this info in it STARTUP_MSG: Starting NodeManager
STARTUP_MSG: host = sandbox.hortonworks.com/10.0.2.15
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.1.2.3.2.0-2950
as you can see the host ip is 10.0.2.15 which not correct
Unable to send metrics to collector by address:http://sandbox.hortonworks.com:6188/ws/v1/timeline/metrics
... View more
05-31-2016
11:01 AM
@Jitendra Yadav it does work thank you but know I have this error org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
... View more
05-31-2016
10:54 AM
@Jitendra Yadav , I fixed the jar's thing and know it work but i got a permission error org.apache.hadoop.security.AccessControlException: Permission denied: user=A62, access=WRITE, inode="/user/A62/.sparkStaging/application_1464688052729_0003":hdfs:hdfs:drwxr-xr-x
... View more
05-31-2016
10:45 AM
@Sandeep Nemuri ok thank you now it's work but i got this error , org.apache.hadoop.security.AccessControlException: Permission denied: user=A62, access=WRITE, inode="/user/A62/.sparkStaging/application_1464688052729_0002":hdfs:hdfs:drwxr-xr-x
... View more
05-31-2016
09:28 AM
@Sandeep Nemuri thank you for your answer , from where you get the spark.local.ip?
... View more
05-31-2016
07:55 AM
1 Kudo
@Rajkumar Singh, actually that's what I need to do , I need to run spark main class as java class, do you have a link explain how to do that?
... View more
05-30-2016
04:40 PM
@Jitendra Yadav all the jar's are there, and I tried to change the version but same error, thank you for you answers.
... View more
05-30-2016
03:54 PM
@Jitendra Yadav Done ,And it still giving almost the same error Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2120)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:97)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:173)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:338)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at com.worldline.bfi.labs.sma.test.SparTest.main(SparTest.java:24)
Caused by: org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:350)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:345)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
... 9 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:346)
... 11 more This is the jar's concerning Spark that i add in my project <dependency>
<groupId> org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-assembly_2.10</artifactId>
<version>1.1.1</version>
<type>pom</type>
</dependency>
... View more
05-30-2016
03:37 PM
@Jitendra Yadav As I am using Maven in my project i put the conf files in ( resources) . And i did change SparkConf as you suggested . And the full logs now. Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1784)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)
at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:180)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:292)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:232)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at com.worldline.bfi.labs.sma.test.SparTest.main(SparTest.java:24)
Caused by: org.apache.spark.SparkException: Unable to load YARN support
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:199)
at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:194)
at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)
... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:195)
... 10 more
... View more
05-30-2016
03:11 PM
@Jitendra Yadav thank you for your answer, 1- should I call the files of yarn confs in my java code or just put them in eclipse project path? 2-this is my code SparkConf conf = new SparkConf().setAppName("sparkForSMA")
.set("spark.master", "yarn-client")
// .set("spark.driver.host", "10.24.246.183");
// .setMaster("spark://sandbox:7077")
// .set("spark.driver.host","sandbox")
.set("spark.local.ip","127.0.0.1")
.set("spark.driver.host","10.24.246.183"); But i get this error . failed to bind to /10.24.246.183:0, shutting down Netty transport
Failed to bind to: /10.24.246.183:0: Service 'sparkDriver' failed after 16 retries!
... View more
05-30-2016
02:36 PM
2 Kudos
Hello, I know this question has been asked before but with no answers, so I am asking it again. I am new to HDP and hadoop.I managed to install HDP 2.2 sandbox on Virtual box. I tried a few sample programs and they are working fine from the sandbox. I have installed Eclipse in my Windows machine. At present ,I use Maven and package my application and deploy the jar in the HDP Sandbox for execution. I would like to execute programs from my Eclipse against the HDP sandbox directly instead of packaging it each and every time. A sample code which I am trying to modify SparkConf conf = new SparkConf().setAppName("sparkApp").setMaster( "local[2]"); I guess , I have to change the local[2] to the master node / yarn cluster url. How do I get the url from the sandbox ? Any other configurations which has to be done on the Virtual box or on my code ?
... View more