Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Exception while trying to connect using spark-shell and pyspark

Exception while trying to connect using spark-shell and pyspark

New Contributor

I have upgraded from cloudera 5..1.x to cloudera CDH 5.7.1 version and using express edition. In prior version i was able to connect to Spark using spark-shell and pyspark. But after upgrading I get below exception if I try to connect using spark-shell command:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:117)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:117)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:117)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:103)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 7 more

 

I have checked the hadoop home path and all related configuration and everything seems fine. Can anyone pls help to figure out whats the issue..

2 REPLIES 2

Re: Exception while trying to connect using spark-shell and pyspark

Champion
It can't find the Hadoop class org/apache/hadoop/fs/FSDataInputStream. Does the hdfs client work correctly?

These are blind stabs but you could try installing or reinstalling the HDFS gateway on the same node. Or reinstalling the Spark gateway. Also make sure that the spark-shell you are using belongs to the CDH 5.7.1 parcel.
Highlighted

Re: Exception while trying to connect using spark-shell and pyspark

Cloudera Employee

Hi,

 

This Error seems to be Jar file got missing. Did you tried adding relevant jars in the classpath

 

Thanks

AK