New Contributor
Posts: 11
Registered: ‎05-15-2017

Oozie Job Struck -Runs normal by manual run


I am trying to run a Oozie workflow which havce few pig & hive scripts along with few hadoop fs commands.

When I start the workflow, it will keeping running  without compeleting one sub josb/task in the worklfow.


Even I try to add this single pig script in the Oozie workflow and it's same case.

Here is the log details for the first pig job which is part of this workflow.

2017-06-11 15:38:55,030 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-06-11 15:38:55,081 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job
2017-06-11 15:38:55,097 [main] INFO  - Key [pig.schematuple] is false, will not generate code.
2017-06-11 15:38:55,100 [main] INFO  - Starting process to move generated code to distributed cache
2017-06-11 15:38:55,101 [main] INFO  - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-06-11 15:38:55,308 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.
2017-06-11 15:38:55,309 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2017-06-11 15:38:55,320 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at quickstart.cloudera/
2017-06-11 15:38:55,350 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation  - is deprecated. Instead, use fs.defaultFS
2017-06-11 15:38:55,901 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat  - Total input paths to process : 1
2017-06-11 15:38:55,902 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
2017-06-11 15:38:55,973 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths (combined) to process : 1
2017-06-11 15:38:56,044 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - number of splits:1
2017-06-11 15:38:56,176 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Submitting tokens for job: job_1497217926065_0007
2017-06-11 15:38:56,176 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Kind: mapreduce.job, Service: job_1497217926065_0006, Ident: (
2017-06-11 15:38:56,176 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Kind: RM_DELEGATION_TOKEN, Service:, Ident: (RM_DELEGATION_TOKEN owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1497220716534, maxDate=1497825516534, sequenceNumber=13, masterKeyId=2)
2017-06-11 15:38:56,812 [JobControl] WARN  org.apache.hadoop.mapreduce.v2.util.MRApps  - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20170405043005/pig/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20170405043005/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
2017-06-11 15:38:56,865 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1497217926065_0007
2017-06-11 15:38:56,911 [JobControl] INFO  org.apache.hadoop.mapreduce.Job  - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1497217926065_0007/
2017-06-11 15:38:56,911 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - HadoopJobId: job_1497217926065_0007
2017-06-11 15:38:56,911 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Processing aliases ArrestData,FArrestData,MostArrestByAge,TotalArrestByAge
2017-06-11 15:38:56,911 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - detailed locations: M: ArrestData[3,13],FArrestData[7,14],TotalArrestByAge[14,19],MostArrestByAge[10,18] C: TotalArrestByAge[14,19],MostArrestByAge[10,18] R: TotalArrestByAge[14,19]
2017-06-11 15:38:56,911 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - More information at: http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_1497217926065_0007
2017-06-11 15:38:56,944 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete
2017-06-11 15:39:09,536 [Service Thread] INFO  org.apache.pig.impl.util.SpillableMemoryManager  - first memory handler call- Usage threshold init = 34603008(33792K) used = 24278176(23709K) committed = 34603008(33792K) max = 34603008(33792K)
Heart beat
Heart beat
Heart beat
Heart beat

Pig Script which I am exeuting,

-- First load the arrest data into relation.

ArrestData = Load 'sat/BPD_Arrests.csv' USING PigStorage(',') AS (Arrest,Age,Sex,Race,ArrestDate,ArrestTime,ArrestLocation,IncidentOffense,IncidentLocation,Charge,ChargeDescription,District,Post,Neighborhood,Location1);

--group data by Gender to get the counts of arrests based on the location.

FArrestData = FILTER ArrestData By $0>0;

MostArrestByAge = GROUP FArrestData BY Age;

--now get the counts of arrests based on Gender

TotalArrestByAge = FOREACH MostArrestByAge GENERATE  group, COUNT(FArrestData) as Total;

--FTotalArrestByAge= ORDER TotalArrestByAge By Total  Desc;

STORE  TotalArrestByAge INTO 'sat/ArrestByAge.csv' USING PigStorage(',');

When I run this .pig file from the CLS, it completed in less than 60 minutes.


Please let me know if I am missing anything here. Thanks

New solutions