We have CDH 6.1.1.
When I run pyspark on command line, it uses version 3
$ pyspark
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/06/07 10:47:44 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.1.1
/_/
When I run oozie workflow with single saprk action. I printed the python version it uses inside spark and I found it is using python version 2.7.5
Python version
2.7.5 (default, Feb 20 2018, 09:19:12) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]
Version info.
sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
Following the part of python code I used in the python:
 
import sys
print("Python version")
print (sys.version)
print("Version info.")
print (sys.version_info)
# In[ ]:
print("printing the values:")
print(sys.argv)
# print("Nominal Time:" + sys.argv[-1])
# print("Data Dependency:" + sys.argv[-2]) How can I change the python version used by oozie in spark actions to use python 3?