Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Error on CloudBreak: Class org.apache.hadoop.fs.adl.AdlFileSystem not found

avatar
Contributor

HI All,

I installed a HDP cluster on cloudbreak and am trying to run a simple Spark Job. I open the "pyspark" shell and run the following:

ip = "adl://alenzadls1.azuredatalakestore.net/path/to/my/input/directory"

input_data = sc.textFile(ip)

for x in input_data.collect():    print x

The print statement returns an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark-client/python/pyspark/rdd.py", line 771, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.AdlFileSystem not found

Can someone point me to where it is going wrong? I did not find anything related to this online.

1 REPLY 1

avatar
@kskp

Maybe you might try it out on a newer Hadoop version.

As of HDP 2.6.1, it contains Hadoop 2.7.3, which contains a known bug very similar to yours.

Hope this helps!