I have a CDH 5.3.9 cluster. Earlier, we used Mrv1 across our cluster for all services and clients. Owing to nature of our applications we have to invoke Sqoop and Hive commands within shell action in Oozie. Earlier, this shell action would run properly in distributed way via MRv1.
Recently, we moved from Mrv1 to YARN. Everything is running smooth via YARN containers, except Hive and Sqoop commands within Oozie shell action run in 'Local Mapred' mode. They work correctly, but they run in Local mode.
When I log into my datanodes ( where my shell actions would run) and manually invoke the Sqoop and Hive commands (tired via various users - yarn, mapred, hdfs) I can see a proper tracking URL for the job being submitted to YARN ( i.e in the non local mode ).
I know I am missing passing some configuration details previously not needed by Mrv1. Can someone please help me setup my shell actions.
Some more details:
I have HA set on my HDFS as well as Mrv2. Oozie and Hive are correctly able to use YARN and submit jobs to it. My shell actions run via YARN. Only problem is sqoop/hive commands within the shell action in Oozie.
Thanks Harsh!! That worked very well.
I installed YARN gateway roles on all my nodes, followed by setting 'export HADOOP_CONF_DIR='/etc/hadoop/conf' in my shell action shell scripts. (Didn't use env var for shell action to avoid killing the job).
Previously, HADOOP_CONF_DIR was pointing to '/run/cloudera-scm-agent/process/11552-yarn-NODEMANAGER'