Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

HDP3.0: spark structured streaming jobs working in HDP2.6 fail

avatar
Rising Star

Hi,

My spark structured streaming jobs working in HDP2.6 failed in HDP3.0:

java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its superinterface org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3268)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3313)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3352)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)
    at org.apache.spark.sql.execution.datasources.HadoopFileLinesReader.<init>(HadoopFileLinesReader.scala:46)
    at org.apache.spark.sql.execution.datasources.json.TextInputJsonDataSource$.readFile(JsonDataSource.scala:125)
    at org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:132)
    at org.apache.spark.sql.execution.datasources.json.JsonFileFormat$$anonfun$buildReader$2.apply(JsonFileFormat.scala:130)
    at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:148)
    at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:128)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:108)
    at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I did not find useful info online. Any clue is appreciated.

1 ACCEPTED SOLUTION

avatar
Rising Star

Per https://spark.apache.org/docs/latest/building-spark.html, spark 2.3.1 is built with hadoop 2.6.X by default. This is why I see my fat jar includes hadoop 2.6.5 (instead of 3.1.0) jars. HftpFileSystem has been removed in hadoop 3. I need spark 2.3.1 jars that built with hadoop 3.1.

On https://spark.apache.org/downloads.html, I only see spark 2.3.1 built with hadoop 2.7. Where can I get spark 2.3.1 built with hadoop 3? Does spark 2.3.1 support hadoop 3? Appreciate your help.

[UPDATE] I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster. Thanks.


View solution in original post

8 REPLIES 8

avatar
Rising Star

Per https://spark.apache.org/docs/latest/building-spark.html, spark 2.3.1 is built with hadoop 2.6.X by default. This is why I see my fat jar includes hadoop 2.6.5 (instead of 3.1.0) jars. HftpFileSystem has been removed in hadoop 3. I need spark 2.3.1 jars that built with hadoop 3.1.

On https://spark.apache.org/downloads.html, I only see spark 2.3.1 built with hadoop 2.7. Where can I get spark 2.3.1 built with hadoop 3? Does spark 2.3.1 support hadoop 3? Appreciate your help.

[UPDATE] I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster. Thanks.


avatar
Frequent Visitor

what is the exact solution?

"I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster."
I get this on spark-submit. Are you saying that i should pass --jars /usr/hdp/current/spark2-client/*.jar ?

avatar
Frequent Visitor

what is the exact solution?

"I solved this issue by using spark 2.3.1 jars under /usr/hdp/current/spark2-client/ from the HDP3.0 cluster."
I get this on spark-submit. Are you saying that i should pass --jars /usr/hdp/current/spark2-client/*.jar ?

avatar
New Member

Hi @Lian Jiang,

I'm facing the exactly same issue. I read your 'update' but I can't solve it. Could you give more details about your solution?

I copied the (spark*.jar) jars from /usr/hdp/current/spark2-client/ to /usr/hdp/3.0.0.0-1634/zeppelin/lib then restart my ambari-server and zeppelin. It does not work.

avatar
Rising Star

@Hayati İbiş

Please copy all jar files instead of only spark*.jar. Hope this helps. Thanks.

avatar

@Lian Jiang

We have the same issue, for sbt i tried to copy /usr/hdp/current/spark2-client/jars/* inside /.ivy2/cache but still doesn't work.. How did you exactyl solve it ?

avatar

I had the same problem with a fat scala jar that no longer worked after an upgrade to Cloudera 6.3.3.

This did the trick for me:

Build a thin jar instead by using the "provided" tag in the build.sbt

  "org.apache.spark" %% "spark-core" % "2.4.0" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided",
  "org.apache.spark" %% "spark-hive" % "2.4.0" % "provided",
  "commons-httpclient" % "commons-httpclient" % "3.1",

 The httpclient jar had to be added and I also had to force utf-8 encoding.

avatar
New Member

That is the reason I see my fat container containing Hadoop 2.6.5 containers (rather than 3.1.0). Expelled HftpFileSystem from Hadoop 3. I need a 2.3.1 container worked from Hadoop 3.1. I just observe Spark 2.3.1 worked from Hadoop 2.7. Where would i be able to get Spark 2.3.1 worked with Hadoop 3? Sparkle 2.3.1 backings Hadoop 3? A debt of gratitude is in order for the assistance. I tackled this issue utilizing 2.3.1 over the HDP3.0 group under/usr/hdp/current/spark2-customer/.