<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Use of Python version 3 scripts for pyspark with HDP 2.4 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141129#M35500</link>
    <description>&lt;P&gt;I had this issue, I modified the imports section of the topology_script to be Python 3 compatible:&lt;/P&gt;&lt;PRE&gt;from __future__ import print_function
import sys, os
try:
    from string import join
except ImportError:
    join = lambda s: " ".join(s)
try:
    import ConfigParser
except ModuleNotFoundError:
    import configparser as ConfigParser
&lt;/PRE&gt;</description>
    <pubDate>Wed, 08 Mar 2017 02:16:38 GMT</pubDate>
    <dc:creator>kdunn926</dc:creator>
    <dc:date>2017-03-08T02:16:38Z</dc:date>
    <item>
      <title>Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141125#M35496</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We have a cluster with HDP 2.4.2.0 and we face an issue when running a Python 3 script that use spark-submit in spark-client mode.
When Spark is activated a Python exeception is raised on the hdp-select and whe could deduce that is a Python version 2 vs version 3 problem.&lt;/P&gt;&lt;P&gt;And subsequent question, is there any trick or a rigth way to have Python 3 scripts with pyspark in HDP ?&lt;/P&gt;&lt;P&gt;See with the following trace :&lt;/P&gt;&lt;PRE&gt;File "/usr/bin/hdp-select", line 202
  print "ERROR: Invalid package - " + name
  ^
SyntaxError: Missing parentheses in call to 'print'&lt;/PRE&gt;&lt;P&gt;[...]&lt;/P&gt;&lt;PRE&gt;WARN ScriptBasedMapping: Exception running /etc/hadoop/conf/topology_script.py 172.28.15.90 
ExitCodeException exitCode=1:  File "/etc/hadoop/conf/topology_script.py", line 62
  print rack
  ^
SyntaxError: Missing parentheses in call to 'print'

   at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
   at org.apache.hadoop.util.Shell.run(Shell.java:487)
   at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
   at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
   at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
   at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
   at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
   at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
   at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
   at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:292)
   at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:284)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:284)

   at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:196)

   at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123)
   at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
   at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)&lt;/PRE&gt;</description>
      <pubDate>Thu, 21 Jul 2016 20:59:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141125#M35496</guid>
      <dc:creator>fabien_toral</dc:creator>
      <dc:date>2016-07-21T20:59:59Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141126#M35497</link>
      <description>&lt;P&gt;Have you tried solution in this thread? &lt;A href="https://community.hortonworks.com/questions/16094/pyspark-with-different-python-versions-on-yarn-is.html" target="_blank"&gt;https://community.hortonworks.com/questions/16094/pyspark-with-different-python-versions-on-yarn-is.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 05:04:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141126#M35497</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-07-22T05:04:34Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141127#M35498</link>
      <description>&lt;P&gt;Have you tried setting the PYSPARK_PYTHON environment variable?&lt;/P&gt;&lt;PRE&gt;export PYSPARK_PYTHON=/usr/local/bin/python3.3&lt;/PRE&gt;&lt;P&gt;Here is the documentation for configuration information: &lt;A target="_blank" href="https://spark.apache.org/docs/1.6.0/configuration.html#environment-variables"&gt;https://spark.apache.org/docs/1.6.0/configuration.html#environment-variables&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2016 02:30:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141127#M35498</guid>
      <dc:creator>myoung</dc:creator>
      <dc:date>2016-07-26T02:30:18Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141128#M35499</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/393/aervits.html" nodeid="393"&gt;@Artem Ervits&lt;/A&gt; , &lt;A rel="user" href="https://community.cloudera.com/users/2695/myoung.html" nodeid="2695"&gt;@Michael Young&lt;/A&gt;, thanks for your replies,&lt;/P&gt;&lt;P&gt;after more investigation we found that the issue mentioned is
      not critical for our results : it seems to be raised by the HDP
      underlying stack and only pollute our logs. We found our correct
      results in the mess and could continue our devs.&lt;/P&gt;&lt;P&gt;PYSPARK_PYTHON, LD_LIBRARY_PATH are correctely set. We found a
      problem with PYTHONHASHSEED, but corrected by setting it with a
      value.&lt;/P&gt;&lt;P&gt;So, I could mark thread as resolved, but how could you explain
      the typical Pyhton version error (the 'Syntax Error' on 'print'
      without parenthesis in hdp-select code) from HDP stack code ?
      Could it be some adherence from HDP into Spark / Yarn
      with other HDP stack modules that break Python 3 compatibility ?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2016 20:49:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141128#M35499</guid>
      <dc:creator>fabien_toral</dc:creator>
      <dc:date>2016-07-26T20:49:44Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141129#M35500</link>
      <description>&lt;P&gt;I had this issue, I modified the imports section of the topology_script to be Python 3 compatible:&lt;/P&gt;&lt;PRE&gt;from __future__ import print_function
import sys, os
try:
    from string import join
except ImportError:
    join = lambda s: " ".join(s)
try:
    import ConfigParser
except ModuleNotFoundError:
    import configparser as ConfigParser
&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Mar 2017 02:16:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141129#M35500</guid>
      <dc:creator>kdunn926</dc:creator>
      <dc:date>2017-03-08T02:16:38Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141130#M35501</link>
      <description>&lt;P&gt;We've done this and changed the HDFS configuration in Ambari to have &lt;/P&gt;&lt;PRE&gt;net.topology.script.file.name=/etc/hadoop/conf/topology_script.py&lt;/PRE&gt;&lt;P&gt;The only problem is that when we restart HDFS this file gets overwritten. How do I stop this behaviour?&lt;/P&gt;</description>
      <pubDate>Fri, 02 Jun 2017 18:14:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141130#M35501</guid>
      <dc:creator>jason_breitweg</dc:creator>
      <dc:date>2017-06-02T18:14:39Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141131#M35502</link>
      <description>&lt;P&gt;Dr. Breitweg, you'll need to make the change with Ambari rather than manually editing the config file, please refer to the following page&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-operations/content/set_rack_id_individual_hosts.html"&gt;https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-operations/content/set_rack_id_individual_hosts.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Jun 2017 02:10:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141131#M35502</guid>
      <dc:creator>kdunn926</dc:creator>
      <dc:date>2017-06-03T02:10:18Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141132#M35503</link>
      <description>&lt;P&gt;Hi Toral,&lt;/P&gt;&lt;P&gt;Can you please explain me in clear how you solve this error. because I'm getting the same error.&lt;/P&gt;</description>
      <pubDate>Sun, 17 Feb 2019 04:47:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141132#M35503</guid>
      <dc:creator>mhrtejadsmlai</dc:creator>
      <dc:date>2019-02-17T04:47:17Z</dc:date>
    </item>
    <item>
      <title>Re: Use of Python version 3 scripts for pyspark with HDP 2.4</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141133#M35504</link>
      <description>&lt;P&gt;It works in HDP-3.1.0.0 and python 3.7.&lt;/P&gt;&lt;P&gt;Thanks! you've saved my day&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2019 17:07:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Use-of-Python-version-3-scripts-for-pyspark-with-HDP-2-4/m-p/141133#M35504</guid>
      <dc:creator>moises_c</dc:creator>
      <dc:date>2019-07-02T17:07:21Z</dc:date>
    </item>
  </channel>
</rss>

