Created 06-11-2016 09:08 PM
when I execute spark-submit --master yarn /usr/hdp/current/spark-client/examples/src/main/python/pi.py on hdp2.4.2 ,I got the error below (which doesn't cause any error on hdp2.4.0),according to the following job log ,sounds we got container_e03_1465095377475_0007_02_000001 which don't recognized by spark and caused java.lang.NumberFormatException: For input string: "e03" error
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/11/spark-hdp-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 16/06/11 16:30:39 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 16/06/11 16:30:39 ERROR ApplicationMaster: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e03_1465095377475_0007_02_000001 at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.getContainerId(YarnSparkHadoopUtil.scala:192) at org.apache.spark.deploy.yarn.YarnRMClient.getAttemptId(YarnRMClient.scala:92) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:142) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672) at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) Caused by: java.lang.NumberFormatException: For input string: "e03" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ... 13 more 16/06/11 16:30:39 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e03_1465095377475_0007_02_000001) 16/06/11 16:30:39 INFO ShutdownHookManager: Shutdown hook called
Created 06-12-2016 05:15 PM
Hi Raj,
thank you for the response . turns out it's caused by phoenix : I added phoenix-4.7.0-HBase-1.1-client-spark.jar to both spark.executor.extraClassPath and spark.driver.extraClassPath . Now I'm using HDP2.4.2 default jar phoenix-spark-4.4.0.2.4.2.0-258.jar the problem disappeared .
However with the default jar , there is no jdbc support to execute statement like below , HDP's phoenix version is too old ! I'm kindly of hoping that HDP could provide a update for phoenix to support JDBC !
df =sqlContext.read.format("org.apache.phoenix.spark").option("table","TABLE1").option("zkUrl","namenode.localdomain:2181:/hbase-unsecure").load()
the error rasied by the upper command : java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver
Created 06-12-2016 04:55 AM
seems there is difference in versions of hadoop jars(hdp) and Spark running on the cluster. are you running vanilla spark on cluster?
Created 06-12-2016 05:15 PM
Hi Raj,
thank you for the response . turns out it's caused by phoenix : I added phoenix-4.7.0-HBase-1.1-client-spark.jar to both spark.executor.extraClassPath and spark.driver.extraClassPath . Now I'm using HDP2.4.2 default jar phoenix-spark-4.4.0.2.4.2.0-258.jar the problem disappeared .
However with the default jar , there is no jdbc support to execute statement like below , HDP's phoenix version is too old ! I'm kindly of hoping that HDP could provide a update for phoenix to support JDBC !
df =sqlContext.read.format("org.apache.phoenix.spark").option("table","TABLE1").option("zkUrl","namenode.localdomain:2181:/hbase-unsecure").load()
the error rasied by the upper command : java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver
Created 06-12-2016 05:27 PM
@dalin qin it looks that phoenix-client jar is missing here, could you please try adding it with your submit options like this
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.0.0-169.jar
Created 06-12-2016 05:43 PM
I further checked , the error was actually caused by ConverterUtils.class in phoenix-4.7.0-HBase-1.1-client-spark.jar , which I think it's supporting hadoop 2.7.2 while HDP2.4.2 is still using 2.7.1 , the container id format has been changed .
Created 06-12-2016 05:57 PM
Hi Raj, I already tried that ,I'm using pyspark , added those jars you mentioned in both spark.executor.extraClassPath and spark.driver.extraClassPath and removed phoenix4.7 ,now my spark-submit is working fine ,only the dataframe by specify classname "org.apache.phoenix.spark" is not working . following is I did just now:
spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.4.2.0-258.jar scala> val df = sqlContext.load( "org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "namenode:2181:/hbase-unsecure")) warning: there were 1 deprecation warning(s); re-run with -deprecation for details java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver at org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:40) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:50) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:1153) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:25) at $iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:30) at $iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:32) at $iwC$iwC$iwC$iwC$iwC.<init>(<console>:34) at $iwC$iwC$iwC$iwC.<init>(<console>:36) at $iwC$iwC$iwC.<init>(<console>:38) at $iwC$iwC.<init>(<console>:40) at $iwC.<init>(<console>:42) at <init>(<console>:44) at .<init>(<console>:48) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.phoenix.jdbc.PhoenixDriver at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 51 more
if you open the jar file like phoenix-spark-4.4.0.2.4.2.0-258.jar ,you will find there is no jdbc exists ,that's the root reason . if you open the jar for phoenix4.7 ,you will jdbc support classes for spark.
Created 06-12-2016 06:43 PM
Just checked the pom.xml file for phoenix 4.7 ,it's based on hadoop 2.5.1 which the container id should looks like container_1465095377475_0007_02_000001, while in hadoop 2.7.1 the container id should looks like container_e03_1465095377475_0007_02_000001. So the old version of class org.apache.hadoop.yarn.util.ConverterUtils.toContainerId couldn't handle the new version's container . I should address this problem in phoenix comminity either.
Created 06-13-2016 06:00 AM
@dalin qin yes, you are right here as I told you earlier in the thread that there is difference in versions of hadoop jars(hdp) and Spark running on the cluster. the phoenix jar issue is a different issue which can be addressed in phoenix community.
Created 06-13-2016 01:18 PM
@dalin qin seems your original issue has been resolved, could you please select the best answer among the thread so that other user get benefit while referrring this thread.