Created 02-21-2017 10:12 AM
Hi all,
We have Spark application written on Java that uses yarn-client mode. We build application into jar file and then run it on cluster with spark-submit tool. It works fine and everything is running well on cluster.
But it is not very easy to test our application directly on cluster. For each even small change I have to create jar file and push it inside the cluster. That's why I would like to run application from my Eclipse(exists on Windows) against cluster remotely.
I use spark-sql_2.11 module and instantiate SparkSession as next:
SparkSession.builder().appName("Data Analyzer").master("yarn-client").config("spark.sql.hive.metastore.jars", "builtin").getOrCreate();
Also I copied core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xm from my test cluster(HDP 2.5) and put them into classpath.
But when running application from Eclipse I got next errors:
org.apache.spark.SparkException: Unable to load YARN support at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:417) at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256) at org.apache.spark.SparkContext.<init>(SparkContext.scala:420) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
Seems like my application can't find YARN jars/configs.
Could you please help me to understand what I'm doing wrong? Is it possible to run Java application with yarn-client mode from Eclipse remotely to the cluster? And what steps should we follow to make it working?
It would be great if you can share your ideas or give me some hints how to overcome this issue.
Best regards,
Olga
Created 02-21-2017 03:38 PM
Hi @Olga Svyryd
I recently had the chance to attend Spark Summit East 2017. One the sessions I attended was "
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet"
Cuesheet has a lot of features including submitting jobs not only client but also straight to cluster. The presenter was using IntelliJ to demo the project. To deep dive into, please follow links below.
Link to slides:
Link to code documentation:
Created 02-21-2017 04:25 PM
Hi Adnan,
Thanks a lot for sharing info. Currently I can't move our project to CueSheet, but nevertheless interesting to know it.
Best regards,
Olga
Created 02-22-2017 02:45 PM
Hi all,
I changed configuration to next
SparkConf conf = new SparkConf(); conf.set("spark.master", "yarn-client"); conf.set("spark.local.ip","192.168.144.133") ; conf.set("spark.driver.host","localhost"); conf.set("spark.sql.hive.metastore.jars", "builtin"); conf.setAppName("Data Analyzer"); this.sparkSession = SparkSession.builder().config(conf).getOrCreate();
and updated dependencies in pom.xml with spark-yarn.jar. In pom.xml I have next dependencies:
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>2.0.1</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-yarn_2.10</artifactId> <version>2.0.1</version> <scope>provided</scope> </dependency>
I keep core-site.xml/hdf-site.xml/yarn-site.xml in src/main/resources folder. And now have another issue.
Here is the stacktrace:
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:225) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:250) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76) at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:345) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.newConfiguration(YarnSparkHadoopUtil.scala:71) at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:54) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.<init>(YarnSparkHadoopUtil.scala:56) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:414) at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256) at org.apache.spark.SparkContext.<init>(SparkContext.scala:420) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)
Also I noticed that if I remove Hadoop config files from src/main/resources the application behaves the same way. So it seems to me like application ignores them. Should I put them in another folder?
Best regards,
Olga
Created 04-16-2017 05:41 PM
@Olga Svyryd Hi did you get the issue? I am also facing same problem.
Hope you help.
Created 04-18-2017 09:59 AM
Hi @chitrartha sur,
I resolved issue with Hadoop home. Here are steps that I did:
System.setProperty("HADOOP_USER_NAME", "root"); System.setProperty("SPARK_YARN_MODE", "yarn");
Those steps were enough to connect with cluster. But I stuck at the step of submitting spark jobs. Spark got a ping of me, start running, but them just hang with status: ACCEPTED. And it lasts forever.
Created 02-07-2018 03:16 PM
You may try to kill all "running" yarn application , to pass "ACCEPTED" status.
After that, I hit this error,
18/02/07 15:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxxxxx, 55614, None)
18/02/07 15:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, xxxxxxx, 55614, None)
18/02/07 15:40:20 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e12_1517985475199_0009_01_000002 on host: hw-host02. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_e12_1517985475199_0009_01_000002 Exit code: 1
18/02/07 15:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.68.30.103, 55614, None) 18/02/07 15:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.68.30.103, 55614, None)
18/02/07 15:40:20 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e12_1517985475199_0009_01_000002 on host: hw-host02. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_e12_1517985475199_0009_01_000002 Exit code: 1
and loop all client... but all failed....