Created 09-30-2016 06:25 AM
I am stuck with error like job gets stuck in running state it might be due to scheduler i guess. Which scheduler is better to use in a 3 node hadoop 2.x cluster? Fair or capacity? can any one help me out with configuration of these 2 and how it works? How queues are being used? Any help would be appreciated... Thanks
Created 09-30-2016 10:30 AM
Do other jobs run? Have you tried smoke test of Hadoop, MR, Oozie & YARN services?
If you can submit a job, it appears the queue is available, however it is possible it is starved for resources in the queue or on nodes. The YARN ResourceManager and NodeManager logs may provide more detail as to any issues running the job. Oozie web UI and logs may also have some information as to what is blocking.
http://hortonworks.com/hadoop-tutorial/configuring-yarn-capacity-scheduler-ambari/
Try defining a single large queue to run the job and see if it completes successfully.
cheers,
Andrew
Created 09-30-2016 10:55 AM
Thanks Andrew. Simple shell script with echo statements are running fine. But shell scripts with commands are getting stuck. I tried with wordcount script it is getting stuck its a map reduce job. I tried changing scheduler, configured with fair scheduler and its working properly but again issue arises in resourcemanager UI running application is not displaying. Any help with this issue would be appreciated. Thanks
Created 09-30-2016 08:02 PM
@Himanshu Kukreja have you looked at nodemanager and resourcemanager logs? You said simple shell script works, I recommend adding some debugging commands to your shell script depending on the nature of your commands, I have similar approach here https://github.com/dbist/oozie/tree/master/apps/shell. Just to make sure your unix commands execute you can print them out, print out whether they're installed on the nodes, etc. Without logs we can't help.
Created 10-11-2016 01:28 PM
stderr-master-node.txtsyslog-master-node.txtstderr-slave-node.txtstdout-slave-node.txtsyslog-slave-node.txtfind the attached log files for the same job. sometimes its get in running state and sometimes throw error
Main class [org.apache.oozie.action.hadoop.ShellMain], exit code 1- JA018. Below are the details and attaching the log files here.
job.properties
nameNode=ns613.mycyberhosting.com:8032
queueName=default
user.name=root
oozie.libpath=${nameNode}/user/root/share/lib/lib_20160905172157
oozie.use.system.libpath=true seeds_shRoot=seeds_sh oozie.wf.application.path=${nameNode}/user/${user.name}/${seeds_shRoot}/apps/shell
workflow.xml
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="seeds-shell-wf">
<start to="seed-shell"/>
<action name="seed-shell">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<exec>job.sh</exec>
<env-var>HADOOP_USER_NAME=root</env-var>
<file>/user/root/seeds_sh/job.sh#job.sh</file>
</shell> <ok to="end"/>
<error to="kill"/> </action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill> <end name="end"/> </workflow-app>
script- job.sh
/home/c1/apache-nutch-2.3.1/runtime/deploy/bin/crawl /user/root/seeds_sh/input-data/seeds_test 2http://ns613.mycyberhosting.com:8983/solr/ddcds 1
Created 10-11-2016 01:29 PM
oozie.log
Created 10-15-2016 07:00 AM
@Artem Ervits check above log files and help me to resolve issue.. Thanks and regards
Created 10-18-2016 07:58 AM
@Himanshu Kukreja can you confirm MR job succeeds when run w/out Oozie? Please provide your capacity-scheduler.xml file.
Created 10-19-2016 06:45 AM
Yes MR job succeeds when run w/out oozie. Attaching capacity-scheduler.xml. capacity-scheduler.txt.
Created 10-19-2016 06:50 AM
@Artem Ervits yes MR job gets succeed when run without oozie. capacity-scheduler.txt. script also run fine without oozie. logs tells launcher fail when runs with oozie not able to figure out anything please help to resolve the issue.