Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to run Spark application with yarn-client mode from Eclipse

avatar
Explorer

Hi all,

We have Spark application written on Java that uses yarn-client mode. We build application into jar file and then run it on cluster with spark-submit tool. It works fine and everything is running well on cluster.

But it is not very easy to test our application directly on cluster. For each even small change I have to create jar file and push it inside the cluster. That's why I would like to run application from my Eclipse(exists on Windows) against cluster remotely.

I use spark-sql_2.11 module and instantiate SparkSession as next:

SparkSession.builder().appName("Data Analyzer").master("yarn-client").config("spark.sql.hive.metastore.jars", "builtin").getOrCreate();

Also I copied core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xm from my test cluster(HDP 2.5) and put them into classpath.

But when running application from Eclipse I got next errors:

org.apache.spark.SparkException: Unable to load YARN support at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:417) at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256) at org.apache.spark.SparkContext.<init>(SparkContext.scala:420) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)

Seems like my application can't find YARN jars/configs.

Could you please help me to understand what I'm doing wrong? Is it possible to run Java application with yarn-client mode from Eclipse remotely to the cluster? And what steps should we follow to make it working?

It would be great if you can share your ideas or give me some hints how to overcome this issue.

Best regards,

Olga

6 REPLIES 6

avatar
Expert Contributor

Hi @Olga Svyryd

I recently had the chance to attend Spark Summit East 2017. One the sessions I attended was "

No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet"

Cuesheet has a lot of features including submitting jobs not only client but also straight to cluster. The presenter was using IntelliJ to demo the project. To deep dive into, please follow links below.

Link to slides:

https://spark-summit.org/east-2017/events/no-more-sbt-assembly-rethinking-spark-submit-using-cueshee...

Link to code documentation:

https://github.com/kakao/cuesheet

https://github.com/jongwook/cuesheet-starter-kit

avatar
Explorer

Hi Adnan,

Thanks a lot for sharing info. Currently I can't move our project to CueSheet, but nevertheless interesting to know it.

Best regards,

Olga

avatar
Explorer

Hi all,

I changed configuration to next

SparkConf conf = new SparkConf();
        conf.set("spark.master", "yarn-client");
        conf.set("spark.local.ip","192.168.144.133") ;
        conf.set("spark.driver.host","localhost");
        conf.set("spark.sql.hive.metastore.jars", "builtin");
        conf.setAppName("Data Analyzer");
        this.sparkSession = SparkSession.builder().config(conf).getOrCreate();

and updated dependencies in pom.xml with spark-yarn.jar. In pom.xml I have next dependencies:

       <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.10</artifactId>
            <version>2.0.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-yarn_2.10</artifactId>
            <version>2.0.1</version>
            <scope>provided</scope>
        </dependency>

I keep core-site.xml/hdf-site.xml/yarn-site.xml in src/main/resources folder. And now have another issue.

Here is the stacktrace:

at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:225) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:250) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76) at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:345) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.newConfiguration(YarnSparkHadoopUtil.scala:71) at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:54) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.<init>(YarnSparkHadoopUtil.scala:56) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:414) at org.apache.spark.deploy.SparkHadoopUtil$.yarn$lzycompute(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.yarn(SparkHadoopUtil.scala:412) at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:437) at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:2223) at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:104) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:165) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256) at org.apache.spark.SparkContext.<init>(SparkContext.scala:420) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)

Also I noticed that if I remove Hadoop config files from src/main/resources the application behaves the same way. So it seems to me like application ignores them. Should I put them in another folder?

Best regards,

Olga

avatar

@Olga Svyryd Hi did you get the issue? I am also facing same problem.

Hope you help.

avatar
Explorer

Hi @chitrartha sur,

I resolved issue with Hadoop home. Here are steps that I did:

  1. Download spark binary package from here http://spark.apache.org/downloads.html
  2. Unpack spark zip, in my case it was spark-2.0.2-bin-hadoop2.7
  3. In eclipse project add new library with all spark jars(taken from spark-2.0.2-bin-hadoop2.7/jars)
  4. Copy hdfs-site.xml, core-site.xml and yarn-site.xm from cluster and put them under src/main/resources
  5. In hdfs-site.xml define next property hdfs-site.xml: <property> <name>dfs.client.use.datanode.hostname</name> <value>true</value> </property>
  6. In configuration of main class add SPARK_HOME environment variable: SPARK_HOME=D:/spark-2.0.2-bin-hadoop2.7
  7. In C:/Windows/System32/drivers/etc/hosts file add line with IP address of Hadoop sandbox and hostname.E.g. 192.168.144.133 sandbox.hortonworks.com
  8. Then code goes: SparkConf conf = new SparkConf(); conf.set("spark.master", "yarn-client"); conf.set("spark.local.ip","IP_OF_SANDBOX") ; conf.set("spark.driver.host","IP_OF_MY_LOCAL_WINDOWS_MACHINE"); conf.set("spark.sql.hive.metastore.jars", "builtin"); conf.setAppName("Application name"); this.sparkSession = SparkSession.builder().config(conf).getOrCreate();

System.setProperty("HADOOP_USER_NAME", "root"); System.setProperty("SPARK_YARN_MODE", "yarn");

Those steps were enough to connect with cluster. But I stuck at the step of submitting spark jobs. Spark got a ping of me, start running, but them just hang with status: ACCEPTED. And it lasts forever.

avatar
New Contributor

You may try to kill all "running" yarn application , to pass "ACCEPTED" status.

After that, I hit this error,

18/02/07 15:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxxxxx, 55614, None)

18/02/07 15:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, xxxxxxx, 55614, None)

18/02/07 15:40:20 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e12_1517985475199_0009_01_000002 on host: hw-host02. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_e12_1517985475199_0009_01_000002 Exit code: 1

18/02/07 15:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.68.30.103, 55614, None) 18/02/07 15:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.68.30.103, 55614, None)

18/02/07 15:40:20 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e12_1517985475199_0009_01_000002 on host: hw-host02. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_e12_1517985475199_0009_01_000002 Exit code: 1

and loop all client... but all failed....