Reply
Explorer
Posts: 14
Registered: ‎10-19-2015

CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

[ Edited ]

Hello,

 

After upgrading a centos VM ( single node) from CDH 5.4.x  to CDH 5.5, pyspark fails to start

with the error

 

 java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

 

HADOOP_HOME is pointing to correct path in /etc/spark/conf/set-env.sh

 

Any  pointers on debugging this issue ? I dont want to download newer VM from cloudera since i have too much invested in the current VM

 

I used parcels to upgrade CDH5.4.x to CDH 5.5 via cloudera manager. 

Completed all the tasks specided to deploy new jars etc..

 

New Contributor
Posts: 3
Registered: ‎01-08-2016

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

I experienced exactly the same issue in a migration from CDH5.4.8 to CHD5.5.1 (I am more a user of such cluster, not the administrator). When running:

 

spark-submit --master yarn --class $CLASS $FATJAR

 

I was getting the same stacktrace. Apparently some environment variables are wrongly set. Curiously sourcing the spark-env.sh manually to get the (what I think) good environment variables before launching spark-submit worked: 

 

source /etc/spark/conf/spark-env.sh

spark-submit --master yarn --class $CLASS $FATJAR

 

 

New Contributor
Posts: 3
Registered: ‎01-08-2016

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

The above script added some new environment variables. The one that made things work was this one: 

 

export SPARK_EXTRA_LIB_PATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/lib:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native

 

Contributor
Posts: 33
Registered: ‎01-08-2016

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Hello Mauriciojost,

 

Is it working now?

 

If not then can you please share SPARK_DIST_CLASSPATH variable value from spark-env.sh. This should help to include package jars for hadoop.

 

Hope that helps. 

New Contributor
Posts: 3
Registered: ‎01-08-2016

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

 Yes, it is working correctly now, thanks. Setting the environment variable SPARK_EXTRA_LIB_PATH exactly as it is set by the script did the trick for me. 

Contributor
Posts: 33
Registered: ‎01-08-2016

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Nice :) Good to hear that.

 

 

devquestions2, can you please try same by setting up SPARK_EXTRA_LIB_PATH environment variable and share the result?

Explorer
Posts: 14
Registered: ‎10-19-2015

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

I will try it, but i have another issue. I get exactly same error when I try to start spark history server from cloudera manager .
Is this similar issue to setting "SPARK_EXTRA_LIB_PATH " ? If so , how do i fix this in cloudera manager ?
Explorer
Posts: 14
Registered: ‎10-19-2015

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

HEre is stderr from CDH Manager

 

CDH_KMS_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop-kms
++ export CDH_PARQUET_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/parquet
++ CDH_PARQUET_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/parquet
++ export CDH_AVRO_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/avro
++ CDH_AVRO_HOME=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/avro
+ echo 'Using /var/run/cloudera-scm-agent/process/1230-spark_on_yarn-SPARK_YARN_HISTORY_SERVER as conf dir'
+ echo 'Using scripts/control.sh as process script'
+ export COMMON_SCRIPT=/usr/lib64/cmf/service/common/cloudera-config.sh
+ COMMON_SCRIPT=/usr/lib64/cmf/service/common/cloudera-config.sh
+ chmod u+x /var/run/cloudera-scm-agent/process/1230-spark_on_yarn-SPARK_YARN_HISTORY_SERVER/scripts/control.sh
+ exec /var/run/cloudera-scm-agent/process/1230-spark_on_yarn-SPARK_YARN_HISTORY_SERVER/scripts/control.sh start_history_server '' -d /user/spark/applicationHistory
Tue Jan 12 18:27:03 EST 2016
Tue Jan 12 18:27:03 EST 2016: Detected CDH_VERSION of [5]
Tue Jan 12 18:27:03 EST 2016: Starting Spark History Server
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:59)
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:53)
    at org.apache.spark.deploy.history.HistoryServer$.<init>(HistoryServer.scala:219)
    at org.apache.spark.deploy.history.HistoryServer$.<clinit>(HistoryServer.scala)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 5 more
Tue Jan 12 18:27:09 EST 2016
+ locate_java_home

Explorer
Posts: 14
Registered: ‎10-19-2015

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

which spark-env.sh

 

i see it in 3 locations

 

/etc/spark/conf.cloudera.spark_on_yarn/spark-env.sh
/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/spark/conf.dist/spark-env.sh
/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/bin/load-spark-env.sh

Explorer
Posts: 14
Registered: ‎10-19-2015

Re: CDH 5.5 pyspark java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

Any suggestions ? where should start debuging ?

Announcements