Reply
Highlighted
Explorer
Posts: 17
Registered: ‎08-08-2017
Accepted Solution

Spark 2 not working after upgrade. PySpark error

[ Edited ]

Hi all guys,

 

I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

 

Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:

 

/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>> 

Can anyone help me with this? Maybe I missed something in the install proccess?


Thanks you so much in advance.

Posts: 1,565
Kudos: 287
Solutions: 239
Registered: ‎07-31-2013

Re: Spark 2 not working after upgrade. PySpark error

Can you check if the host you're executing 'pyspark' on has a Spark (1.6) Gateway plus a YARN Gateway role deployed on it? These would translate to valid /etc/hadoop/conf/ and /etc/spark/conf/ directories.
Backline Customer Operations Engineer
Explorer
Posts: 17
Registered: ‎08-08-2017

Re: Spark 2 not working after upgrade. PySpark error

Hi Harsh, thanks you for your reply.

 

The node where I'm executing pyspark doesn't have a Spark 1.6 Gateway role, should have it?

It has Spark 2 Gateway role and JobHistoryServer, NodeManager and ResourceManager roles for YARN.

Posts: 1,565
Kudos: 287
Solutions: 239
Registered: ‎07-31-2013

Re: Spark 2 not working after upgrade. PySpark error

The command 'pyspark' is for Spark 1.6 so it certainly needs a Spark
Gateway to function. If you want to use PySpark with Spark 2, the command
is 'pyspark2' instead.
Backline Customer Operations Engineer
Explorer
Posts: 17
Registered: ‎08-08-2017

Re: Spark 2 not working after upgrade. PySpark error

Okay, that is first news for me. Then since I want to use Spark 2, it's the same for spark-submit? I just have to submit my application and having installed Spark2 instead of Spark? Or also this command changes for Spark2?

 

Thanks you so much.

Posts: 1,565
Kudos: 287
Solutions: 239
Registered: ‎07-31-2013

Re: Spark 2 not working after upgrade. PySpark error

The standalone Spark 2.x is designed to co-exist with the CDH-included
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands
Backline Customer Operations Engineer
Explorer
Posts: 17
Registered: ‎08-08-2017

Re: Spark 2 not working after upgrade. PySpark error

That's helpful and it's all I missed. Thanks you so much, I'm marking your last answer as solution.

 

Best regards.

Announcements