Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

topology.py not Python 3 compatible

SOLVED Go to solution

topology.py not Python 3 compatible

New Contributor

I don't know who's responsible for writing topology.py, but it uses Python 2 syntax, so if I try to run PySpark with Python 3 using

 

export PYSPARK_PYTHON=python3

 

I get tons of stacktraces.

 

Uri

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: topology.py not Python 3 compatible

This was fixed in CM 5.9.

Re: topology.py not Python 3 compatible

Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.

 

Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:

@@ -1,8 +1,8 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved.
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
- 
+
'''
This script is provided by CMF for hadoop to determine network/rack topology.
It is automatically generated and could be replaced at any time. Any changes
@@ -12,8 +12,13 @@ made to it will be lost when this happens.
import os
import sys
import xml.dom.minidom
-from string import join
- 
+
+try:
+ xrange
+except NameError:
+ # support for python3, which basically renamed xrange to range
+ xrange = range
+
def main():
MAP_FILE = '{{CMF_CONF_DIR}}/topology.map'
DEFAULT_RACK = '/default'
@@ -40,14 +45,14 @@ def main():
map[node.getAttribute("name")] = node.getAttribute("rack")
except:
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
- print default_rack
+ print(default_rack)
return -1
- 
+
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
if len(sys.argv)==1:
- print default_rack
+ print(default_rack)
else:
- print join([map.get(i, default_rack) for i in sys.argv[1:]], " ")
+ print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]]))
return 0

if __name__ == "__main__":

 

5 REPLIES 5

Re: topology.py not Python 3 compatible

This was fixed in CM 5.9.

Re: topology.py not Python 3 compatible

New Contributor

Any workaround for earlier versions?  I'm on 5.5 and I don't manage the cluster.

Re: topology.py not Python 3 compatible

Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.

 

Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:

@@ -1,8 +1,8 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved.
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
- 
+
'''
This script is provided by CMF for hadoop to determine network/rack topology.
It is automatically generated and could be replaced at any time. Any changes
@@ -12,8 +12,13 @@ made to it will be lost when this happens.
import os
import sys
import xml.dom.minidom
-from string import join
- 
+
+try:
+ xrange
+except NameError:
+ # support for python3, which basically renamed xrange to range
+ xrange = range
+
def main():
MAP_FILE = '{{CMF_CONF_DIR}}/topology.map'
DEFAULT_RACK = '/default'
@@ -40,14 +45,14 @@ def main():
map[node.getAttribute("name")] = node.getAttribute("rack")
except:
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
- print default_rack
+ print(default_rack)
return -1
- 
+
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
if len(sys.argv)==1:
- print default_rack
+ print(default_rack)
else:
- print join([map.get(i, default_rack) for i in sys.argv[1:]], " ")
+ print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]]))
return 0

if __name__ == "__main__":

 

Highlighted

Re: topology.py not Python 3 compatible

Explorer

Hi,

     I'm cluster manager,and CDH version is 5.7.2.

      The same trouble,If I can change some params in CM to solve this problem. 

Re: topology.py not Python 3 compatible

New Contributor

One easy option in old CDH versions is just to change the shebang at the beginning of the script to:

 

#!/usr/bin/env python2

Actually this will be more precise than the default, because the script is actually a python 2 script and ... python 3 is coming ... so it is good to be specific about the given version of the interpreter the script needs. We could be even more explicit and require python2.7.