Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop/Hive in Oozie shell action running in local mode post Mrv1 to Mrv2 migration

avatar
Explorer

I have a CDH 5.3.9 cluster. Earlier, we used Mrv1 across our cluster for all services and clients. Owing to nature of our applications we have to invoke Sqoop and Hive commands within shell action in Oozie. Earlier, this shell action would run properly in distributed way via MRv1.

 

Recently, we moved from Mrv1 to YARN. Everything is running smooth via YARN containers, except Hive and Sqoop commands within Oozie shell action run in 'Local Mapred' mode. They work correctly, but they run in Local mode.

 

When I log into my datanodes ( where my shell actions would run) and manually invoke the Sqoop and Hive commands (tired via various users - yarn, mapred, hdfs) I can see a proper tracking URL for the job being submitted to YARN ( i.e in the non local mode ).

I know I am missing passing some configuration details previously not needed by Mrv1. Can someone please help me setup my shell actions.

 

Some more details:

I have HA set on my HDFS as well as Mrv2. Oozie and Hive are correctly able to use YARN and submit jobs to it. My shell actions run via YARN. Only problem is sqoop/hive commands within the shell action in Oozie.

 

1 ACCEPTED SOLUTION

avatar
Mentor
This occurs due to the actions inheriting YARN NM configs which are not
pre-configured for MR2. Since MR2 is an app-side concept in YARN and not an
inbuilt/server-side one, your action environment does not find the adequate
configs by referencing the NM ones.

This was improved via https://issues.apache.org/jira/browse/OOZIE-2343 in
CDH 5.5.0+, which ships configs along with the shell scripts that include
MR2 specifics.

For your older CDH version however, you can try the below:

Step 1: Ensure all your hosts have a YARN/MR2 Gateway role added on it, and
that client configuration is deployed on all hosts at /etc/hadoop/conf/*.
Step 2: Add the env-var 'HADOOP_CONF_DIR=/etc/hadoop/conf' to all shell
actions via the shell action configuration for passing environments, or via
manual edits to the top of the shell scripts.

View solution in original post

2 REPLIES 2

avatar
Mentor
This occurs due to the actions inheriting YARN NM configs which are not
pre-configured for MR2. Since MR2 is an app-side concept in YARN and not an
inbuilt/server-side one, your action environment does not find the adequate
configs by referencing the NM ones.

This was improved via https://issues.apache.org/jira/browse/OOZIE-2343 in
CDH 5.5.0+, which ships configs along with the shell scripts that include
MR2 specifics.

For your older CDH version however, you can try the below:

Step 1: Ensure all your hosts have a YARN/MR2 Gateway role added on it, and
that client configuration is deployed on all hosts at /etc/hadoop/conf/*.
Step 2: Add the env-var 'HADOOP_CONF_DIR=/etc/hadoop/conf' to all shell
actions via the shell action configuration for passing environments, or via
manual edits to the top of the shell scripts.

avatar
Explorer

Thanks Harsh!! That worked very well.

 

I installed YARN gateway roles on all my nodes, followed by setting 'export HADOOP_CONF_DIR='/etc/hadoop/conf' in my shell action shell scripts. (Didn't use env var for shell action to avoid killing the job).

Previously, HADOOP_CONF_DIR was pointing to '/run/cloudera-scm-agent/process/11552-yarn-NODEMANAGER'