Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

2.4.2 spark-submit got Invalid ContainerId

avatar
Contributor

when I execute spark-submit --master yarn /usr/hdp/current/spark-client/examples/src/main/python/pi.py on hdp2.4.2 ,I got the error below (which doesn't cause any error on hdp2.4.0),according to the following job log ,sounds we got container_e03_1465095377475_0007_02_000001 which don't recognized by spark and caused java.lang.NumberFormatException: For input string: "e03" error

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/11/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
16/06/11 16:30:39 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/06/11 16:30:39 ERROR ApplicationMaster: Uncaught exception: 
java.lang.IllegalArgumentException: Invalid ContainerId: container_e03_1465095377475_0007_02_000001
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
	at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.getContainerId(YarnSparkHadoopUtil.scala:192)
	at org.apache.spark.deploy.yarn.YarnRMClient.getAttemptId(YarnRMClient.scala:92)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:142)
	at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672)
	at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:69)
	at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:68)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670)
	at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697)
	at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.NumberFormatException: For input string: "e03"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:589)
	at java.lang.Long.parseLong(Long.java:631)
	at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
	... 13 more
16/06/11 16:30:39 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e03_1465095377475_0007_02_000001)
16/06/11 16:30:39 INFO ShutdownHookManager: Shutdown hook called
1 ACCEPTED SOLUTION

avatar
Contributor

Hi Raj,

thank you for the response . turns out it's caused by phoenix : I added phoenix-4.7.0-HBase-1.1-client-spark.jar to both spark.executor.extraClassPath and spark.driver.extraClassPath . Now I'm using HDP2.4.2 default jar phoenix-spark-4.4.0.2.4.2.0-258.jar the problem disappeared .

However with the default jar , there is no jdbc support to execute statement like below , HDP's phoenix version is too old ! I'm kindly of hoping that HDP could provide a update for phoenix to support JDBC !

df =sqlContext.read.format("org.apache.phoenix.spark").option("table","TABLE1").option("zkUrl","namenode.localdomain:2181:/hbase-unsecure").load()

the error rasied by the upper command : java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver

View solution in original post

8 REPLIES 8

avatar
Super Guru

seems there is difference in versions of hadoop jars(hdp) and Spark running on the cluster. are you running vanilla spark on cluster?

avatar
Contributor

Hi Raj,

thank you for the response . turns out it's caused by phoenix : I added phoenix-4.7.0-HBase-1.1-client-spark.jar to both spark.executor.extraClassPath and spark.driver.extraClassPath . Now I'm using HDP2.4.2 default jar phoenix-spark-4.4.0.2.4.2.0-258.jar the problem disappeared .

However with the default jar , there is no jdbc support to execute statement like below , HDP's phoenix version is too old ! I'm kindly of hoping that HDP could provide a update for phoenix to support JDBC !

df =sqlContext.read.format("org.apache.phoenix.spark").option("table","TABLE1").option("zkUrl","namenode.localdomain:2181:/hbase-unsecure").load()

the error rasied by the upper command : java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver

avatar
Super Guru

@dalin qin it looks that phoenix-client jar is missing here, could you please try adding it with your submit options like this

spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar

avatar
Contributor

I further checked , the error was actually caused by ConverterUtils.class in phoenix-4.7.0-HBase-1.1-client-spark.jar , which I think it's supporting hadoop 2.7.2 while HDP2.4.2 is still using 2.7.1 , the container id format has been changed .

avatar
Contributor

Hi Raj, I already tried that ,I'm using pyspark , added those jars you mentioned in both spark.executor.extraClassPath and spark.driver.extraClassPath and removed phoenix4.7 ,now my spark-submit is working fine ,only the dataframe by specify classname "org.apache.phoenix.spark" is not working . following is I did just now:

spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.2.0-258.jar

scala> val df = sqlContext.load(  "org.apache.phoenix.spark",  Map("table" -> "TABLE1", "zkUrl" -> "namenode:2181:/hbase-unsecure"))
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver
        at org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:40)
        at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
        at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153)
        at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:25)
        at $iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:30)
        at $iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:32)
        at $iwC$iwC$iwC$iwC$iwC.<init>(<console>:34)
        at $iwC$iwC$iwC$iwC.<init>(<console>:36)
        at $iwC$iwC$iwC.<init>(<console>:38)
        at $iwC$iwC.<init>(<console>:40)
        at $iwC.<init>(<console>:42)
        at <init>(<console>:44)
        at .<init>(<console>:48)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.phoenix.jdbc.PhoenixDriver
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 51 more

if you open the jar file like phoenix-spark-4.4.0.2.4.2.0-258.jar ,you will find there is no jdbc exists ,that's the root reason . if you open the jar for phoenix4.7 ,you will jdbc support classes for spark.

avatar
Contributor

Just checked the pom.xml file for phoenix 4.7 ,it's based on hadoop 2.5.1 which the container id should looks like container_1465095377475_0007_02_000001, while in hadoop 2.7.1 the container id should looks like container_e03_1465095377475_0007_02_000001. So the old version of class org.apache.hadoop.yarn.util.ConverterUtils.toContainerId couldn't handle the new version's container . I should address this problem in phoenix comminity either.

avatar
Super Guru

@dalin qin yes, you are right here as I told you earlier in the thread that there is difference in versions of hadoop jars(hdp) and Spark running on the cluster. the phoenix jar issue is a different issue which can be addressed in phoenix community.

avatar
Super Guru

@dalin qin seems your original issue has been resolved, could you please select the best answer among the thread so that other user get benefit while referrring this thread.