Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

OOzie chane python version it uses


We have CDH 6.1.1.

When I run pyspark on command line, it uses version 3

$ pyspark
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/06/07 10:47:44 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.1.1

When I run oozie workflow with single saprk action. I printed the python version it uses inside spark and I found it is using python version 2.7.5

Python version
2.7.5 (default, Feb 20 2018, 09:19:12) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
Version info.
sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)

Following the part of python code I used in the python:


import sys
print("Python version")
print (sys.version)
print("Version info.")
print (sys.version_info)

# In[ ]:
print("printing the values:")
# print("Nominal Time:" + sys.argv[-1])
# print("Data Dependency:" + sys.argv[-2])

 How can I change the python version used by oozie in spark actions to use python 3?


Master Collaborator

Hello Ahmed,


Please see this post that gives an example of Python 3 script use with Oozie. Note that you specify the executable for python3 as the first line in your .py script.


Let me know if that helps,