Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Highlighted

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

New Contributor

In our case it was a problem with hadoop classpath configuration.

 

A quick fix will be (a big thank to one of my collegues):

 

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

spark-submit ...

 

You can change it in spark env script or just add the line to your .bashrc.

 

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Thanks for the suggestion.

 

I have finally reinstalled CDH 5.5 from scratch i.e uninstalled and did a clean installed instead of update. That worked. I am not able to run everything fine.

Highlighted

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Explorer

@mauriciojost wrote:

 

I was getting the same stacktrace. Apparently some environment variables are wrongly set. Curiously sourcing the spark-env.sh manually to get the (what I think) good environment variables before launching spark-submit worked: 

 

source /etc/spark/conf/spark-env.sh

 

 


Ok this worked for me too. 

Now I am wondering how to really fix this, rather that having to use this workaround everytime.

Any ideas and explanations are welcome

Highlighted

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Explorer

I had the same issue when executing spark-submit. The issue was that I did not have all my paths set and there is an easy way to fix if you know which jars you want to include, simply use the --jars parameter, like this:

 

spark-submit ... --jars /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hadoop-common-2.6.0-cdh5.10.0.jar ...

Highlighted

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Cloudera Employee

Another cause of this error is trying to run spark-submit when you only have Spark 2 installed, in which case you just need to run spark2-submit (no configuration changes needed)

Don't have an account?
Coming from Hortonworks? Activate your account here