Created 08-30-2018 03:42 PM
Hi,
My spark structured streaming jobs working in HDP2.6 failed in HDP3.0:
java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3268) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3313) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85) at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.<init>(HadoopFileLinesReader.scala:46) at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.readFile(JsonDataSource.scala:125) at org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:132) at org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:130) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:148) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:128) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:108) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
I did not find useful info online. Any clue is appreciated.
Created 08-30-2018 04:57 PM
Per https://spark.apache.org/docs/latest/building-spark.html, spark 2.3.1 is built with hadoop 2.6.X by default. This is why I see my fat jar includes hadoop 2.6.5 (instead of 3.1.0) jars. HftpFileSystem has been removed in hadoop 3. I need spark 2.3.1 jars that built with hadoop 3.1.
On https://spark.apache.org/downloads.html, I only see spark 2.3.1 built with hadoop 2.7. Where can I get spark 2.3.1 built with hadoop 3? Does spark 2.3.1 support hadoop 3? Appreciate your help.
[UPDATE] I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster. Thanks.
Created 08-30-2018 04:57 PM
Per https://spark.apache.org/docs/latest/building-spark.html, spark 2.3.1 is built with hadoop 2.6.X by default. This is why I see my fat jar includes hadoop 2.6.5 (instead of 3.1.0) jars. HftpFileSystem has been removed in hadoop 3. I need spark 2.3.1 jars that built with hadoop 3.1.
On https://spark.apache.org/downloads.html, I only see spark 2.3.1 built with hadoop 2.7. Where can I get spark 2.3.1 built with hadoop 3? Does spark 2.3.1 support hadoop 3? Appreciate your help.
[UPDATE] I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster. Thanks.
Created 09-26-2019 12:19 PM
what is the exact solution?
"I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster."
I get this on spark-submit. Are you saying that i should pass --jars /usr/hdp/current/spark2-client/*.jar ?
Created 09-26-2019 01:11 PM
what is the exact solution?
"I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster."
I get this on spark-submit. Are you saying that i should pass --jars /usr/hdp/current/spark2-client/*.jar ?
Created 09-24-2018 12:31 PM
Hi @Lian Jiang,
I'm facing the exactly same issue. I read your 'update' but I can't solve it. Could you give more details about your solution?
I copied the (spark*.jar) jars from /usr/hdp/current/spark2-client/ to /usr/hdp/3.0.0.0-1634/zeppelin/lib then restart my ambari-server and zeppelin. It does not work.
Created 09-25-2018 08:45 PM
Please copy all jar files instead of only spark*.jar. Hope this helps. Thanks.
Created 06-03-2019 05:17 PM
We have the same issue, for sbt i tried to copy /usr/hdp/current/spark2-client/jars/* inside /.ivy2/cache but still doesn't work.. How did you exactyl solve it ?
Created 07-09-2020 12:23 PM
I had the same problem with a fat scala jar that no longer worked after an upgrade to Cloudera 6.3.3.
This did the trick for me:
Build a thin jar instead by using the "provided" tag in the build.sbt
"org.apache.spark" %% "spark-core" % "2.4.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.4.0" % "provided",
"org.apache.spark" %% "spark-hive" % "2.4.0" % "provided",
"commons-httpclient" % "commons-httpclient" % "3.1",
The httpclient jar had to be added and I also had to force utf-8 encoding.
Created on 07-15-2020 06:45 AM - edited 07-15-2020 06:47 AM
That is the reason I see my fat container containing Hadoop 2.6.5 containers (rather than 3.1.0). Expelled HftpFileSystem from Hadoop 3. I need a 2.3.1 container worked from Hadoop 3.1. I just observe Spark 2.3.1 worked from Hadoop 2.7. Where would i be able to get Spark 2.3.1 worked with Hadoop 3? Sparkle 2.3.1 backings Hadoop 3? A debt of gratitude is in order for the assistance. I tackled this issue utilizing 2.3.1 over the HDP3.0 group under/usr/hdp/current/spark2-customer/.