Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

topology.py not Python 3 compatible

Solved Go to solution

topology.py not Python 3 compatible

New Contributor

I don't know who's responsible for writing topology.py, but it uses Python 2 syntax, so if I try to run PySpark with Python 3 using

 

export PYSPARK_PYTHON=python3

 

I get tons of stacktraces.

 

Uri

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: topology.py not Python 3 compatible

This was fixed in CM 5.9.

Re: topology.py not Python 3 compatible

Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.

 

Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:

@@ -1,8 +1,8 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved.
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
- 
+
'''
This script is provided by CMF for hadoop to determine network/rack topology.
It is automatically generated and could be replaced at any time. Any changes
@@ -12,8 +12,13 @@ made to it will be lost when this happens.
import os
import sys
import xml.dom.minidom
-from string import join
- 
+
+try:
+ xrange
+except NameError:
+ # support for python3, which basically renamed xrange to range
+ xrange = range
+
def main():
MAP_FILE = '{{CMF_CONF_DIR}}/topology.map'
DEFAULT_RACK = '/default'
@@ -40,14 +45,14 @@ def main():
map[node.getAttribute("name")] = node.getAttribute("rack")
except:
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
- print default_rack
+ print(default_rack)
return -1
- 
+
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
if len(sys.argv)==1:
- print default_rack
+ print(default_rack)
else:
- print join([map.get(i, default_rack) for i in sys.argv[1:]], " ")
+ print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]]))
return 0

if __name__ == "__main__":

 

5 REPLIES 5

Re: topology.py not Python 3 compatible

This was fixed in CM 5.9.

Re: topology.py not Python 3 compatible

New Contributor

Any workaround for earlier versions?  I'm on 5.5 and I don't manage the cluster.

Re: topology.py not Python 3 compatible

Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.

 

Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:

@@ -1,8 +1,8 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved.
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
- 
+
'''
This script is provided by CMF for hadoop to determine network/rack topology.
It is automatically generated and could be replaced at any time. Any changes
@@ -12,8 +12,13 @@ made to it will be lost when this happens.
import os
import sys
import xml.dom.minidom
-from string import join
- 
+
+try:
+ xrange
+except NameError:
+ # support for python3, which basically renamed xrange to range
+ xrange = range
+
def main():
MAP_FILE = '{{CMF_CONF_DIR}}/topology.map'
DEFAULT_RACK = '/default'
@@ -40,14 +45,14 @@ def main():
map[node.getAttribute("name")] = node.getAttribute("rack")
except:
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
- print default_rack
+ print(default_rack)
return -1
- 
+
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
if len(sys.argv)==1:
- print default_rack
+ print(default_rack)
else:
- print join([map.get(i, default_rack) for i in sys.argv[1:]], " ")
+ print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]]))
return 0

if __name__ == "__main__":

 

Re: topology.py not Python 3 compatible

Explorer

Hi,

     I'm cluster manager,and CDH version is 5.7.2.

      The same trouble,If I can change some params in CM to solve this problem. 

Highlighted

Re: topology.py not Python 3 compatible

New Contributor

One easy option in old CDH versions is just to change the shebang at the beginning of the script to:

 

#!/usr/bin/env python2

Actually this will be more precise than the default, because the script is actually a python 2 script and ... python 3 is coming ... so it is good to be specific about the given version of the interpreter the script needs. We could be even more explicit and require python2.7.