Reply
New Contributor
Posts: 2
Registered: ‎02-14-2017
Accepted Solution

topology.py not Python 3 compatible

I don't know who's responsible for writing topology.py, but it uses Python 2 syntax, so if I try to run PySpark with Python 3 using

 

export PYSPARK_PYTHON=python3

 

I get tons of stacktraces.

 

Uri

Cloudera Employee
Posts: 508
Registered: ‎07-30-2013

Re: topology.py not Python 3 compatible

New Contributor
Posts: 2
Registered: ‎02-14-2017

Re: topology.py not Python 3 compatible

Any workaround for earlier versions?  I'm on 5.5 and I don't manage the cluster.

Cloudera Employee
Posts: 508
Registered: ‎07-30-2013

Re: topology.py not Python 3 compatible

Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.

 

Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:

@@ -1,8 +1,8 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved.
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
- 
+
'''
This script is provided by CMF for hadoop to determine network/rack topology.
It is automatically generated and could be replaced at any time. Any changes
@@ -12,8 +12,13 @@ made to it will be lost when this happens.
import os
import sys
import xml.dom.minidom
-from string import join
- 
+
+try:
+ xrange
+except NameError:
+ # support for python3, which basically renamed xrange to range
+ xrange = range
+
def main():
MAP_FILE = '{{CMF_CONF_DIR}}/topology.map'
DEFAULT_RACK = '/default'
@@ -40,14 +45,14 @@ def main():
map[node.getAttribute("name")] = node.getAttribute("rack")
except:
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
- print default_rack
+ print(default_rack)
return -1
- 
+
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
if len(sys.argv)==1:
- print default_rack
+ print(default_rack)
else:
- print join([map.get(i, default_rack) for i in sys.argv[1:]], " ")
+ print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]]))
return 0

if __name__ == "__main__":

 

zbz
New Contributor
Posts: 5
Registered: ‎10-07-2018

Re: topology.py not Python 3 compatible

[ Edited ]

Hi,

     I'm cluster manager,and CDH version is 5.7.2.

      The same trouble,If I can change some params in CM to solve this problem. 

Announcements