- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Sqoop/Hive in Oozie shell action running in local mode post Mrv1 to Mrv2 migration
Created on ‎07-09-2017 06:43 AM - edited ‎09-16-2022 04:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a CDH 5.3.9 cluster. Earlier, we used Mrv1 across our cluster for all services and clients. Owing to nature of our applications we have to invoke Sqoop and Hive commands within shell action in Oozie. Earlier, this shell action would run properly in distributed way via MRv1.
Recently, we moved from Mrv1 to YARN. Everything is running smooth via YARN containers, except Hive and Sqoop commands within Oozie shell action run in 'Local Mapred' mode. They work correctly, but they run in Local mode.
When I log into my datanodes ( where my shell actions would run) and manually invoke the Sqoop and Hive commands (tired via various users - yarn, mapred, hdfs) I can see a proper tracking URL for the job being submitted to YARN ( i.e in the non local mode ).
I know I am missing passing some configuration details previously not needed by Mrv1. Can someone please help me setup my shell actions.
Some more details:
I have HA set on my HDFS as well as Mrv2. Oozie and Hive are correctly able to use YARN and submit jobs to it. My shell actions run via YARN. Only problem is sqoop/hive commands within the shell action in Oozie.
Created ‎07-09-2017 06:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
pre-configured for MR2. Since MR2 is an app-side concept in YARN and not an
inbuilt/server-side one, your action environment does not find the adequate
configs by referencing the NM ones.
This was improved via https://issues.apache.org/jira/browse/OOZIE-2343 in
CDH 5.5.0+, which ships configs along with the shell scripts that include
MR2 specifics.
For your older CDH version however, you can try the below:
Step 1: Ensure all your hosts have a YARN/MR2 Gateway role added on it, and
that client configuration is deployed on all hosts at /etc/hadoop/conf/*.
Step 2: Add the env-var 'HADOOP_CONF_DIR=/etc/hadoop/conf' to all shell
actions via the shell action configuration for passing environments, or via
manual edits to the top of the shell scripts.
Created ‎07-09-2017 06:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
pre-configured for MR2. Since MR2 is an app-side concept in YARN and not an
inbuilt/server-side one, your action environment does not find the adequate
configs by referencing the NM ones.
This was improved via https://issues.apache.org/jira/browse/OOZIE-2343 in
CDH 5.5.0+, which ships configs along with the shell scripts that include
MR2 specifics.
For your older CDH version however, you can try the below:
Step 1: Ensure all your hosts have a YARN/MR2 Gateway role added on it, and
that client configuration is deployed on all hosts at /etc/hadoop/conf/*.
Step 2: Add the env-var 'HADOOP_CONF_DIR=/etc/hadoop/conf' to all shell
actions via the shell action configuration for passing environments, or via
manual edits to the top of the shell scripts.
Created ‎07-10-2017 01:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Harsh!! That worked very well.
I installed YARN gateway roles on all my nodes, followed by setting 'export HADOOP_CONF_DIR='/etc/hadoop/conf' in my shell action shell scripts. (Didn't use env var for shell action to avoid killing the job).
Previously, HADOOP_CONF_DIR was pointing to '/run/cloudera-scm-agent/process/11552-yarn-NODEMANAGER'
