My cluster nodes have two versions on python installed. 2.6.6 in /usr/bin 2.7.12 in /usr/local/bin. I installed some python modules for the 2.7.12 version associated with geolocation. When I run a job locally on one of the nodes it runs fine. When it is submitted through yarn I get the following
“ImportError: No module named ipaddress”
ipaddress is one of the modules I installed.
I suspect yarn is using the 2.6.6 version of python. How can I determine if this is the case and if it is how can I define yarn to use the python in /usr/local/bin?
Could you please provide more information about what kind of "job" are you trying to run through yarn? Are you using Spark? Custom native YARN App? Distributed Shell? Knit?
Nevertheless, you could run a simple distributed shell app to see which python version YARN picks up:
yarn jar path/to/hadoop-yarn-applications-distributedshell.jar -jar path/to/hadoop-yarn-applications-distributedshell.jar -shell_command python -shell_args -V
Or you can check the same with the framework you are using.