Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Confusion when submitting job in yarn-cluster mode

Highlighted

Confusion when submitting job in yarn-cluster mode

Hi, I have a very simple Spark job,

 

    SparkConf conf = new SparkConf()
      .setAppName("Append!")
      .setMaster("yarn-cluster")
      .set("spark.executor.memory", "128m")
      .set("spark.driver.memory", "128m")
      .set("spark.yarn.app.id", "append");
    JavaSparkContext context = new JavaSparkContext(conf);
    JavaRDD<String> stringJavaRDD = context.textFile("hdfs://localhost:8020/user/brett/hello-world.txt");
    JavaRDD<String> appended = stringJavaRDD.map(new Function<String, String>() {
      @Override
      public String call(String v1) throws Exception {
        return v1 + "!";
      }
    });
    List<String> output = appended.collect();
    for (String s : output) {
      System.out.println(s);
    }
    context.stop();

 

When I use "yarn-client" the job runs, but when I use "yarn-cluster" I get the following (snipped),

 

15/02/18 20:54:36 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
15/02/18 20:54:36 INFO netty.NettyBlockTransferService: Server created on 56736
15/02/18 20:54:36 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/02/18 20:54:36 INFO storage.BlockManagerMasterActor: Registering block manager 10.0.2.15:56736 with 945.5 MB RAM, BlockManagerId(<driver>, 10.0.2.15, 56736)
15/02/18 20:54:36 INFO storage.BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.NullPointerException
        at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:520)
        at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:46)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:494)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
        at com.bretthoerner.TestSpark.main(DlmApp.java:57)

 

It seems like it's trying to use an ApplicationMaster that hasn't been initialized: https://github.com/apache/spark/blob/branch-1.2/yarn/common/src/main/scala/org/apache/spark/deploy/y...

 

And it's only initialized if I use ApplicationMaster as my main: https://github.com/apache/spark/blob/branch-1.2/yarn/common/src/main/scala/org/apache/spark/deploy/y...

 

What am I missing here? Following the call chain I don't see an obvious way that this would ever work.

 

Thanks!

 

5 REPLIES 5

Re: Confusion when submitting job in yarn-cluster mode

New Contributor

Hi,

 

I have the exact same error.

Did you find a solution to your problem ?

 

Thanks,

 

Rémy.

Re: Confusion when submitting job in yarn-cluster mode

Contributor

I had problems like this on some nodes at startup in previous versions but not on 5.4

What version are you using?

Re: Confusion when submitting job in yarn-cluster mode

New Contributor

I got the issue on CDH 5.4.0.

 

I don't use the spark-submit at all, but a java program that executes my spark job. Everything works correctly if I replace yarn-cluster by yarn-client.

Re: Confusion when submitting job in yarn-cluster mode

New Contributor

Do you think it could come from a "silent" missing dependency ?

Re: Confusion when submitting job in yarn-cluster mode

Contributor

spark-submit does multiple things to prepare the submission to yarn. A classpath error could manifest itself as a NPE in the client.

 

Have a look in YARN for errors during the driver start.

Also can you try to run it first with spark-submit to know if the problem is in your cluster or in the submit code?

Don't have an account?
Coming from Hortonworks? Activate your account here