Hi all guys,
I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:
/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123) at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in <module> sc = SparkContext(pyFiles=add_files) File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number >>>
Can anyone help me with this? Maybe I missed something in the install proccess?
Thanks you so much in advance.
Hi Harsh, thanks you for your reply.
The node where I'm executing pyspark doesn't have a Spark 1.6 Gateway role, should have it?
It has Spark 2 Gateway role and JobHistoryServer, NodeManager and ResourceManager roles for YARN.
Okay, that is first news for me. Then since I want to use Spark 2, it's the same for spark-submit? I just have to submit my application and having installed Spark2 instead of Spark? Or also this command changes for Spark2?
Thanks you so much.