<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Spark 2 not working after upgrade. PySpark error in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61569#M52077</link>
    <description>&lt;P&gt;Hi all guys,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it &lt;A href="https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html" target="_blank"&gt;https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments.&amp;lt;init&amp;gt;(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in &amp;lt;module&amp;gt;
    sc = SparkContext(pyFiles=add_files)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
&amp;gt;&amp;gt;&amp;gt; &lt;/PRE&gt;
&lt;P&gt;Can anyone help me with this? Maybe I missed something in the install proccess?&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Thanks you so much in advance.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 12:29:37 GMT</pubDate>
    <dc:creator>josholsan</dc:creator>
    <dc:date>2022-09-16T12:29:37Z</dc:date>
    <item>
      <title>Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61569#M52077</link>
      <description>&lt;P&gt;Hi all guys,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I had Spark 1.6 in my cluster working with YARN. I wanted to use Spark 2 in my cluster due to Data Frames and I followed the instructions in this link to install it &lt;A href="https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html" target="_blank"&gt;https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once I finally installed Spark 2, if I try to start pyspark from console it gives me the following stacktrace:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/bin$ pyspark
Python 2.7.6 (default, Oct 26 2016, 20:30:19) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:123)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:123)
	at org.apache.spark.deploy.SparkSubmitArguments.&amp;lt;init&amp;gt;(SparkSubmitArguments.scala:109)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 7 more
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/shell.py", line 43, in &amp;lt;module&amp;gt;
    sc = SparkContext(pyFiles=add_files)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 112, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
&amp;gt;&amp;gt;&amp;gt; &lt;/PRE&gt;
&lt;P&gt;Can anyone help me with this? Maybe I missed something in the install proccess?&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Thanks you so much in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:29:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61569#M52077</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2022-09-16T12:29:37Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61601#M52078</link>
      <description>Can you check if the host you're executing 'pyspark' on has a Spark (1.6) Gateway plus a YARN Gateway role deployed on it? These would translate to valid /etc/hadoop/conf/ and /etc/spark/conf/ directories.</description>
      <pubDate>Wed, 08 Nov 2017 06:25:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61601#M52078</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2017-11-08T06:25:54Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61612#M52079</link>
      <description>&lt;P&gt;Hi Harsh, thanks you for your reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The node where I'm executing pyspark doesn't have a Spark 1.6 Gateway role, should have it?&lt;/P&gt;&lt;P&gt;It has Spark 2 Gateway role and JobHistoryServer, NodeManager and ResourceManager roles for YARN.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2017 07:29:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61612#M52079</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2017-11-08T07:29:34Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61613#M52080</link>
      <description>The command 'pyspark' is for Spark 1.6 so it certainly needs a Spark&lt;BR /&gt;Gateway to function. If you want to use PySpark with Spark 2, the command&lt;BR /&gt;is 'pyspark2' instead.&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Nov 2017 07:31:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61613#M52080</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2017-11-08T07:31:37Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61614#M52081</link>
      <description>&lt;P&gt;Okay, that is first news for me. Then since I want to use Spark 2, it's the same for spark-submit? I just have to submit my application and having installed Spark2 instead of Spark? Or also this command changes for Spark2?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks you so much.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2017 07:44:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61614#M52081</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2017-11-08T07:44:58Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61615#M52082</link>
      <description>The standalone Spark 2.x is designed to co-exist with the CDH-included&lt;BR /&gt;Spark 1.6, and as such all the commands differ. The command difference list&lt;BR /&gt;is available at&lt;BR /&gt;&lt;A href="https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands" target="_blank"&gt;https://www.cloudera.com/documentation/spark2/latest/topics/spark_running_apps.html#spark2_commands&lt;/A&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Nov 2017 07:47:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61615#M52082</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2017-11-08T07:47:37Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 not working after upgrade. PySpark error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61616#M52083</link>
      <description>&lt;P&gt;That's helpful and it's all I missed. Thanks you so much, I'm marking your last answer as solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best regards.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2017 07:50:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-2-not-working-after-upgrade-PySpark-error/m-p/61616#M52083</guid>
      <dc:creator>josholsan</dc:creator>
      <dc:date>2017-11-08T07:50:03Z</dc:date>
    </item>
  </channel>
</rss>

