Spark Error on CloudBreak: Class org.apache.hadoop.fs.adl.AdlFileSystem not found


I installed a HDP cluster on cloudbreak and am trying to run a simple Spark Job. I open the "pyspark" shell and run the following:

ip = "adl://"

input_data = sc.textFile(ip)

for x in input_data.collect():    print x

The print statement returns an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark-client/python/pyspark/", line 771, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/usr/hdp/current/spark-client/python/lib/", line 813, in __call__
  File "/usr/hdp/current/spark-client/python/pyspark/sql/", line 45, in deco
    return f(*a, **kw)
  File "/usr/hdp/current/spark-client/python/lib/", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.AdlFileSystem not found

Can someone point me to where it is going wrong? I did not find anything related to this online.



Maybe you might try it out on a newer Hadoop version.

As of HDP 2.6.1, it contains Hadoop 2.7.3, which contains a known bug very similar to yours.

Hope this helps!