Reply
New Contributor
Posts: 5
Registered: ‎11-11-2015

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

In our case it was a problem with hadoop classpath configuration.

 

A quick fix will be (a big thank to one of my collegues):

 

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

spark-submit ...

 

You can change it in spark env script or just add the line to your .bashrc.

 

Explorer
Posts: 14
Registered: ‎10-19-2015

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Thanks for the suggestion.

 

I have finally reinstalled CDH 5.5 from scratch i.e uninstalled and did a clean installed instead of update. That worked. I am not able to run everything fine.

New Contributor
Posts: 7
Registered: ‎08-30-2016

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

I had the same issue when executing spark-submit. The issue was that I did not have all my paths set and there is an easy way to fix if you know which jars you want to include, simply use the --jars parameter, like this:

 

spark-submit ... --jars /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hadoop-common-2.6.0-cdh5.10.0.jar ...

Highlighted
New Contributor
Posts: 1
Registered: ‎08-19-2017

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream


mauriciojost wrote:

 

I was getting the same stacktrace. Apparently some environment variables are wrongly set. Curiously sourcing the spark-env.sh manually to get the (what I think) good environment variables before launching spark-submit worked: 

 

source /etc/spark/conf/spark-env.sh

 

 


Ok this worked for me too. 

Now I am wondering how to really fix this, rather that having to use this workaround everytime.

Any ideas and explanations are welcome

Announcements