- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark 2 not working after upgrade. PySpark error
- Labels:
-
Apache Spark
-
Apache YARN
-
Cloudera Manager
Created on ‎11-07-2017 01:52 AM - edited ‎09-16-2022 05:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all guys,
I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123) at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in <module> sc = SparkContext(pyFiles=add_files) File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number >>>
Can anyone help me with this? Maybe I missed something in the install proccess?
Thanks you so much in advance.
Created ‎11-07-2017 11:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands
Created ‎11-07-2017 10:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎11-07-2017 11:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Harsh, thanks you for your reply.
The node where I'm executing pyspark doesn't have a Spark 1.6 Gateway role, should have it?
It has Spark 2 Gateway role and JobHistoryServer, NodeManager and ResourceManager roles for YARN.
Created ‎11-07-2017 11:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gateway to function. If you want to use PySpark with Spark 2, the command
is 'pyspark2' instead.
Created ‎11-07-2017 11:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, that is first news for me. Then since I want to use Spark 2, it's the same for spark-submit? I just have to submit my application and having installed Spark2 instead of Spark? Or also this command changes for Spark2?
Thanks you so much.
Created ‎11-07-2017 11:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Spark 1.6, and as such all the commands differ. The command difference list
is available at
https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands
Created ‎11-07-2017 11:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's helpful and it's all I missed. Thanks you so much, I'm marking your last answer as solution.
Best regards.
