Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Master Guru

Troubleshooting Oozie job is a pain! It kills your time and patience 🙂

.

Here are few steps which can save your valuable time:

.

1. Always check Oozie launcher's stderr section to see if there is any error.

Please find an useful article here to see how to check Oozie launcher logs.

.

2. Check stdout logs to see if Oozie has launched any child job which has some error and because of which launcher got failed.

Expand the stdout section and search for string "Submitted application" to see what all child jobs got triggered by launcher.

.

3. Few situations are complex to troubleshoot. Child job gets completed successfully. There is no error in the stderr section and still your launcher gets failed with "Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]" error.

.

Sample stdout logs:

2016-12-06 09:03:39,986 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1367)) -  map 100% reduce 0%
2016-12-06 09:03:39,991 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1378)) - Job job_XXXXXX_YYYY completed successfully
.

.

.
2016-12-06 09:03:40,228 DEBUG [main] hive.TableDefWriter (TableDefWriter.java:getLoadDataStmt(252)) - Load statement: LOAD DATA INPATH 'hdfs://XXXXXXX' OVERWRITE INTO TABLE `XXXXXX` 
65695 [main] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive 
2016-12-06 09:03:40,229 INFO [main] hive.HiveImport (HiveImport.java:importTable(195)) - Loading uploaded data into Hive 
. 
65711 [main] DEBUG org.apache.sqoop.hive.HiveImport - Using in-process Hive instance. 
2016-12-06 09:03:40,245 DEBUG [main] hive.HiveImport (HiveImport.java:executeScript(326)) - Using in-process Hive instance. 
[Loaded org.apache.sqoop.util.SubprocessSecurityManager from file:/dataXXX/hadoop/yarn/local/filecache/693/sqoop-1.4.6.2.3.4.0-3485.jar] 
[Loaded org.apache.sqoop.util.ExitSecurityException from file:/dataXXX/hadoop/yarn/local/filecache/693/sqoop-1.4.6.2.3.4.0-3485.jar] 
[Loaded com.cloudera.sqoop.util.ExitSecurityException from file:/dataXXX/hadoop/yarn/local/filecache/693/sqoop-1.4.6.2.3.4.0-3485.jar] 
65714 [main] DEBUG org.apache.sqoop.util.SubprocessSecurityManager - Installing subprocess security manager 
2016-12-06 09:03:40,248 DEBUG [main] util.SubprocessSecurityManager (SubprocessSecurityManager.java:install(59)) - Installing subprocess security manager 
[Loaded org.apache.hadoop.hive.ql.metadata.HiveException from file:/dataXXX/hadoop/yarn/local/filecache/778/hive-exec-1.2.1.2.3.4.0-3485.jar] 
[Loaded org.apache.hadoop.hive.ql.security.authorization.plugin.HiveMetastoreClientFactory from file:/dataXXX/hadoop/yarn/local/filecache/778/hive-exec-1.2.1.2.3.4.0-3485.jar] 
. 
. 
. 
[Loaded org.apache.oozie.action.hadoop.JavaMainException from file:/dataXXX/hadoop/yarn/local/filecache/365/oozie-sharelib-oozie-4.2.0.2.3.4.0-3485.jar] 
[Loaded org.apache.oozie.action.hadoop.LauncherMainException from file:/dataXXX/hadoop/yarn/local/filecache/365/oozie-sharelib-oozie-4.2.0.2.3.4.0-3485.jar] 
Intercepting System.exit(1) 
<<< Invocation of Main class completed <<< 
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1] 
Oozie Launcher failed, finishing Hadoop job gracefully

.

How to troubleshoot this?

By default, when Yarn application gets finished, nodemanager deletes temporary data from local container directories. In case of above issue, we will have to retain it for some time and check hive.log inside container directory.

Below are the detailed steps to do this:

1. Please add below property in yarn-site.xml to retain container directory after application is finished. yarn.nodemanager.delete.debug-delay-sec=1800 ( I have set it for 30 minutes. you can change the value as per your convenience )

2. Restart required services via Ambari.

3. Rerun the Oozie job.

4. Goto the failed launcher job logs and find the Node manager where launcher was run ( which is failed )

5. Expand launch container section of the application logs.

6. Find value of PWD

7. Login to the node manager and cd to $PWD ( value obtained in step 6 )

8. find file with name hive.log inside container's directory

e.g. find . -name hive.log 

9. hive.log should have actual error which is not visible in application logs.

.

Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! :)

4,272 Views
0 Kudos