Created 06-19-2018 11:45 AM
I have downloaded the latest hdp-sandbox docker. Here I am trying to run a spark job using Yarn Rest API approach.
I am using the following payload:
{ "application-id":"application_1528874935802_0047", "application-name":"test", "am-container-spec": { "local-resources": { "entry": [ { "key":"AppMaster.jar", "value": { "resource":"hdfs://<host>:8020/user/spark-examples_2.11-2.2.0.2.6.4.0-91.jar", "type":"FILE", "visibility":"APPLICATION", "size": "43004", "timestamp": "1528878009810" } } ] }, "commands": { "command":"{{JAVA_HOME}}/bin/java -Xmx10m org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar AppMaster.jar 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr" }, "environment": { "entry": [ { "key": "DISTRIBUTEDSHELLSCRIPTTIMESTAMP", "value": "1528342427276" }, { "key": "CLASSPATH", "value": "{{CLASSPATH}}<CPS>./*<CPS>AppMaster.jar<CPS>{{HADOOP_CONF_DIR}}<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/spark2-client/jars/*<CPS>{{HADOOP_YARN_HOME}}/*<CPS>{{HADOOP_YARN_HOME}}/lib/*<CPS>./log4j.properties" }, { "key": "DISTRIBUTEDSHELLSCRIPTLEN", "value": "6" } ] } }, "unmanaged-AM":"false", "max-app-attempts":"2", "resource": { "memory":"1024", "vCores":"1" }, "application-type":"YARN", "keep-containers-across-application-attempts":"false" }.
The spark job gets submitted but it fails with the folllowing error:
Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.conf.Configuration cannot be cast to org.apache.hadoop.yarn.conf.YarnConfiguration at org.apache.spark.deploy.yarn.ApplicationMaster.<init>(ApplicationMaster.scala:61) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:767) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Failing this attempt. Failing the application.
Created 06-20-2018 02:32 AM
How are you submitting the the spark job? are you doing "spark-submit" ?
Please provide the command used to submit the job.
Created 06-20-2018 05:10 AM
Following is the command I use to submit the spark job :
{{JAVA_HOME}}/bin/java -Xmx1024m org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar AppMaster.jar 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr.
I have mentioned the payload in the question only where I have used this command.
Although when I use the spark-submit (from the payload) I am able to run the spark job successfully. But I think this is a hack as it submits two different jobs, first one because of my yarn rest API and then internally the second one because of the spark command which is mentioned in the payload. Following is the spark submit command :
{{SPARK_HOME}}/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1g --executor-memory 1g --executor-cores 1 /tmp/spark-examples_2.11-2.1.0.2.6.0.3-8.jar 10
Hence I wanted to run a job successfully using java command.
Created 06-20-2018 05:33 AM
Spark submit is the right way to submit the spark application as spark-submit sets up the correct classpaths for you. If you are running it as java program, then you need to take care of all these setups which would become tricky. And I guess even now you are facing the issue due to incorrect jars in classpath.
Also please help me understand the usecase ? Whats the purpose of launching spark job using yarn rest api ?
If you do simple spark-submit, it will take care of negotiating resources from YARN and to run the application.
Created 06-20-2018 06:53 AM
Actually, I wanted to submit the spark job remotely (irrespective of the spark version there, from the host where I am submitting the job) but with the spark-submit command, I think this will not be possible, that is why I wanted to run a job successfully using java command.
My use case is to run a spark job remotely to any environment which can have any version of spark, from local where I will have a fixed spark version (let say 2.2.0)
@ssharma, I was referring to https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_yarn-resource-management/content/ch_yarn....
I was also trying to submit the spark job in the same way. That was for Spark 1.6, now I am trying to run for Spark 2.2. Is there any change done which you are aware of?