Support Questions

Find answers, ask questions, and share your expertise

Spark 2 not working after upgrade. PySpark error

avatar
Explorer

Hi all guys,

 

I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

 

Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:

 

/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>> 

Can anyone help me with this? Maybe I missed something in the install proccess?


Thanks you so much in advance.

1 ACCEPTED SOLUTION

avatar
Mentor
The standalone Spark 2.x is designed to co-exist with the CDH-included
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands

View solution in original post

6 REPLIES 6

avatar
Mentor
Can you check if the host you're executing 'pyspark' on has a Spark (1.6) Gateway plus a YARN Gateway role deployed on it? These would translate to valid /etc/hadoop/conf/ and /etc/spark/conf/ directories.

avatar
Explorer

Hi Harsh, thanks you for your reply.

 

The node where I'm executing pyspark doesn't have a Spark 1.6 Gateway role, should have it?

It has Spark 2 Gateway role and JobHistoryServer, NodeManager and ResourceManager roles for YARN.

avatar
Mentor
The command 'pyspark' is for Spark 1.6 so it certainly needs a Spark
Gateway to function. If you want to use PySpark with Spark 2, the command
is 'pyspark2' instead.

avatar
Explorer

Okay, that is first news for me. Then since I want to use Spark 2, it's the same for spark-submit? I just have to submit my application and having installed Spark2 instead of Spark? Or also this command changes for Spark2?

 

Thanks you so much.

avatar
Mentor
The standalone Spark 2.x is designed to co-exist with the CDH-included
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands

avatar
Explorer

That's helpful and it's all I missed. Thanks you so much, I'm marking your last answer as solution.

 

Best regards.