Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Spark 2 not working after upgrade. PySpark error

SOLVED Go to solution

Spark 2 not working after upgrade. PySpark error

Explorer

Hi all guys,

 

I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

 

Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:

 

/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>> 

Can anyone help me with this? Maybe I missed something in the install proccess?


Thanks you so much in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark 2 not working after upgrade. PySpark error

Master Guru
The standalone Spark 2.x is designed to co-exist with the CDH-included
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands
6 REPLIES 6

Re: Spark 2 not working after upgrade. PySpark error

Master Guru
Can you check if the host you're executing 'pyspark' on has a Spark (1.6) Gateway plus a YARN Gateway role deployed on it? These would translate to valid /etc/hadoop/conf/ and /etc/spark/conf/ directories.
Highlighted

Re: Spark 2 not working after upgrade. PySpark error

Explorer

Hi Harsh, thanks you for your reply.

 

The node where I'm executing pyspark doesn't have a Spark 1.6 Gateway role, should have it?

It has Spark 2 Gateway role and JobHistoryServer, NodeManager and ResourceManager roles for YARN.

Re: Spark 2 not working after upgrade. PySpark error

Master Guru
The command 'pyspark' is for Spark 1.6 so it certainly needs a Spark
Gateway to function. If you want to use PySpark with Spark 2, the command
is 'pyspark2' instead.

Re: Spark 2 not working after upgrade. PySpark error

Explorer

Okay, that is first news for me. Then since I want to use Spark 2, it's the same for spark-submit? I just have to submit my application and having installed Spark2 instead of Spark? Or also this command changes for Spark2?

 

Thanks you so much.

Re: Spark 2 not working after upgrade. PySpark error

Master Guru
The standalone Spark 2.x is designed to co-exist with the CDH-included
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands

Re: Spark 2 not working after upgrade. PySpark error

Explorer

That's helpful and it's all I missed. Thanks you so much, I'm marking your last answer as solution.

 

Best regards.