question Re: 2.4.2 spark-submit got Invalid ContainerId in Support Questions

2.4.2 spark-submit got Invalid ContainerId

dblive — Sun, 12 Jun 2016 04:08:10 GMT

when I execute spark-submit --master yarn /usr/hdp/current/spark-client/examples/src/main/python/pi.py on hdp2.4.2 ,I got the error below (which doesn't cause any error on hdp2.4.0),according to the following job log ,sounds we got container_e03_1465095377475_0007_02_000001 which don't recognized by spark and caused java.lang.NumberFormatException: For input string: "e03" error

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/11/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
16/06/11 16:30:39 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/06/11 16:30:39 ERROR ApplicationMaster: Uncaught exception: 
java.lang.IllegalArgumentException: Invalid ContainerId: container_e03_1465095377475_0007_02_000001
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182)
	at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.getContainerId(YarnSparkHadoopUtil.scala:192)
	at org.apache.spark.deploy.yarn.YarnRMClient.getAttemptId(YarnRMClient.scala:92)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:142)
	at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672)
	at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:69)
	at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:68)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670)
	at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697)
	at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.lang.NumberFormatException: For input string: "e03"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:589)
	at java.lang.Long.parseLong(Long.java:631)
	at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)
	at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)
	... 13 more
16/06/11 16:30:39 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e03_1465095377475_0007_02_000001)
16/06/11 16:30:39 INFO ShutdownHookManager: Shutdown hook called

Re: 2.4.2 spark-submit got Invalid ContainerId

rajkumar_singh — Sun, 12 Jun 2016 11:55:10 GMT

seems there is difference in versions of hadoop jars(hdp) and Spark running on the cluster. are you running vanilla spark on cluster?

Re: 2.4.2 spark-submit got Invalid ContainerId

dblive — Mon, 13 Jun 2016 00:15:42 GMT

Hi Raj,

thank you for the response . turns out it's caused by phoenix : I added phoenix-4.7.0-HBase-1.1-client-spark.jar to both spark.executor.extraClassPath and spark.driver.extraClassPath . Now I'm using HDP2.4.2 default jar phoenix-spark-4.4.0.2.4.2.0-258.jar the problem disappeared .

However with the default jar , there is no jdbc support to execute statement like below , HDP's phoenix version is too old ! I'm kindly of hoping that HDP could provide a update for phoenix to support JDBC !

df =sqlContext.read.format("org.apache.phoenix.spark").option("table","TABLE1").option("zkUrl","namenode.localdomain:2181:/hbase-unsecure").load()

the error rasied by the upper command : java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver

Re: 2.4.2 spark-submit got Invalid ContainerId

rajkumar_singh — Mon, 13 Jun 2016 00:27:26 GMT

@dalin qin it looks that phoenix-client jar is missing here, could you please try adding it with your submit options like this

spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar

Re: 2.4.2 spark-submit got Invalid ContainerId

dblive — Mon, 13 Jun 2016 00:43:11 GMT

I further checked , the error was actually caused by ConverterUtils.class in phoenix-4.7.0-HBase-1.1-client-spark.jar , which I think it's supporting hadoop 2.7.2 while HDP2.4.2 is still using 2.7.1 , the container id format has been changed .

Re: 2.4.2 spark-submit got Invalid ContainerId

dblive — Mon, 13 Jun 2016 00:57:23 GMT

Hi Raj, I already tried that ,I'm using pyspark , added those jars you mentioned in both spark.executor.extraClassPath and spark.driver.extraClassPath and removed phoenix4.7 ,now my spark-submit is working fine ,only the dataframe by specify classname "org.apache.phoenix.spark" is not working . following is I did just now:

spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.2.0-258.jar

scala> val df = sqlContext.load(  "org.apache.phoenix.spark",  Map("table" -> "TABLE1", "zkUrl" -> "namenode:2181:/hbase-unsecure"))
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver
        at org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:40)
        at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50)
        at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153)
        at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:25)
        at $iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:30)
        at $iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:32)
        at $iwC$iwC$iwC$iwC$iwC.<init>(<console>:34)
        at $iwC$iwC$iwC$iwC.<init>(<console>:36)
        at $iwC$iwC$iwC.<init>(<console>:38)
        at $iwC$iwC.<init>(<console>:40)
        at $iwC.<init>(<console>:42)
        at <init>(<console>:44)
        at .<init>(<console>:48)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.phoenix.jdbc.PhoenixDriver
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 51 more

if you open the jar file like phoenix-spark-4.4.0.2.4.2.0-258.jar ,you will find there is no jdbc exists ,that's the root reason . if you open the jar for phoenix4.7 ,you will jdbc support classes for spark.

Re: 2.4.2 spark-submit got Invalid ContainerId

dblive — Mon, 13 Jun 2016 01:43:22 GMT

Just checked the pom.xml file for phoenix 4.7 ,it's based on hadoop 2.5.1 which the container id should looks like container_1465095377475_0007_02_000001, while in hadoop 2.7.1 the container id should looks like container_e03_1465095377475_0007_02_000001. So the old version of class org.apache.hadoop.yarn.util.ConverterUtils.toContainerId couldn't handle the new version's container . I should address this problem in phoenix comminity either.

Re: 2.4.2 spark-submit got Invalid ContainerId

rajkumar_singh — Mon, 13 Jun 2016 13:00:17 GMT

@dalin qin yes, you are right here as I told you earlier in the thread that there is difference in versions of hadoop jars(hdp) and Spark running on the cluster. the phoenix jar issue is a different issue which can be addressed in phoenix community.

Re: 2.4.2 spark-submit got Invalid ContainerId

rajkumar_singh — Mon, 13 Jun 2016 20:18:44 GMT

@dalin qin seems your original issue has been resolved, could you please select the best answer among the thread so that other user get benefit while referrring this thread.