- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
topology.py not Python 3 compatible
- Labels:
-
Apache Spark
-
Apache YARN
-
Cloudera Manager
Created on ‎02-14-2017 10:49 AM - edited ‎09-16-2022 04:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know who's responsible for writing topology.py, but it uses Python 2 syntax, so if I try to run PySpark with Python 3 using
export PYSPARK_PYTHON=python3
I get tons of stacktraces.
Uri
Created ‎02-14-2017 10:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎02-14-2017 04:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.
Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:
@@ -1,8 +1,8 @@ #!/usr/bin/env python # -# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved. +# Copyright (c) 2016 Cloudera, Inc. All rights reserved. # - + ''' This script is provided by CMF for hadoop to determine network/rack topology. It is automatically generated and could be replaced at any time. Any changes @@ -12,8 +12,13 @@ made to it will be lost when this happens. import os import sys import xml.dom.minidom -from string import join - + +try: + xrange +except NameError: + # support for python3, which basically renamed xrange to range + xrange = range + def main(): MAP_FILE = '{{CMF_CONF_DIR}}/topology.map' DEFAULT_RACK = '/default' @@ -40,14 +45,14 @@ def main(): map[node.getAttribute("name")] = node.getAttribute("rack") except: default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)]) - print default_rack + print(default_rack) return -1 - + default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)]) if len(sys.argv)==1: - print default_rack + print(default_rack) else: - print join([map.get(i, default_rack) for i in sys.argv[1:]], " ") + print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]])) return 0 if __name__ == "__main__":
Created ‎02-14-2017 10:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎02-14-2017 12:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any workaround for earlier versions? I'm on 5.5 and I don't manage the cluster.
Created ‎02-14-2017 04:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job.
Assuming you can change that topology script, here's the relevant portion of the diff that you can apply:
@@ -1,8 +1,8 @@ #!/usr/bin/env python # -# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved. +# Copyright (c) 2016 Cloudera, Inc. All rights reserved. # - + ''' This script is provided by CMF for hadoop to determine network/rack topology. It is automatically generated and could be replaced at any time. Any changes @@ -12,8 +12,13 @@ made to it will be lost when this happens. import os import sys import xml.dom.minidom -from string import join - + +try: + xrange +except NameError: + # support for python3, which basically renamed xrange to range + xrange = range + def main(): MAP_FILE = '{{CMF_CONF_DIR}}/topology.map' DEFAULT_RACK = '/default' @@ -40,14 +45,14 @@ def main(): map[node.getAttribute("name")] = node.getAttribute("rack") except: default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)]) - print default_rack + print(default_rack) return -1 - + default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)]) if len(sys.argv)==1: - print default_rack + print(default_rack) else: - print join([map.get(i, default_rack) for i in sys.argv[1:]], " ") + print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]])) return 0 if __name__ == "__main__":
Created on ‎11-08-2018 05:58 PM - edited ‎11-08-2018 06:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm cluster manager,and CDH version is 5.7.2.
The same trouble,If I can change some params in CM to solve this problem.
Created ‎12-10-2018 09:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One easy option in old CDH versions is just to change the shebang at the beginning of the script to:
#!/usr/bin/env python2
Actually this will be more precise than the default, because the script is actually a python 2 script and ... python 3 is coming ... so it is good to be specific about the given version of the interpreter the script needs. We could be even more explicit and require python2.7.
