Reply
New Contributor
Posts: 6
Registered: ‎06-06-2017

oozie pig heart beat

hello

i 've downloaded a free cloudera distribution.

 

i try to make a pig test with oozie.

 

the pig script is very simple : it takes a file and stores it in a hdfs directory location.

 

master1 = LOAD '$INPUT'
		    using PigStorage(';') as (
		      year:chararray
		    , dep:chararray
		    , nbetab:chararray
		    , cap:chararray);

STORE master1 INTO '$OUTPUT';   

 

my job properties is:

oozie.use.system.libpath=True
security_enabled=False
dryrun=True
jobTracker=quickstart.cloudera:8032
nameNode=hdfs://quickstart.cloudera:8020

my worflow.xml is:

<workflow-app name="testpiggy" xmlns="uri:oozie:workflow:0.5">
  <global>
            <configuration>
                <property>
                    <name>mapred.map.tasks</name>
                    <value>4</value>
                </property>
                <property>
                    <name>mapred.reduce.tasks</name>
                    <value>4</value>
                </property>
            </configuration>
  </global>
    <start to="pig-31ed"/>
    <kill name="Kill">

when i try to execute this workflow, i get this (and i must kill the job...) :

Pig command arguments :
             -file
             testoozie.pig
             -param
             INPUT=/user/cloudera/chantiercible/lot2/pig/datasource/datacapacite.csv
             -param
             OUTPUT=/user/cloudera/chantiercible/lot2/oozie/pigdemo/Odata
             -log4jconf
             /yarn/nm/usercache/cloudera/appcache/application_1498580068153_0005/container_1498580068153_0005_01_000002/piglog4j.properties
             -logfile
             pig-job_1498580068153_0005.log
Fetching child yarn jobs
tag id : oozie-ff953aec56888fb3b7c4bea48fc7aab9
Child yarn jobs are found - 
=================================================================

>>> Invoking Pig command line now >>>


Run pig script using PigRunner.run() for Pig version 0.8+
Apache Pig version 0.12.0-cdh5.8.0 (rexported) 
compiled Jun 16 2016, 12:40:41

Run pig script using PigRunner.run() for Pig version 0.8+
2017-06-27 09:49:55,709 [main] INFO  org.apache.pig.Main  - Apache Pig version 0.12.0-cdh5.8.0 (rexported) compiled Jun 16 2016, 12:40:41
2017-06-27 09:49:55,711 [main] INFO  org.apache.pig.Main  - Logging error messages to: /yarn/nm/usercache/cloudera/appcache/application_1498580068153_0005/container_1498580068153_0005_01_000002/pig-job_1498580068153_0005.log
2017-06-27 09:49:55,868 [main] INFO  org.apache.pig.impl.util.Utils  - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2017-06-27 09:49:56,110 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-06-27 09:49:56,111 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-06-27 09:49:56,111 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020
2017-06-27 09:49:56,123 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to map-reduce job tracker at: quickstart.cloudera:8032
2017-06-27 09:49:56,131 [main] WARN  org.apache.pig.PigServer  - Empty string specified for jar path
2017-06-27 09:49:57,168 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  - Pig features used in the script: UNKNOWN
2017-06-27 09:49:57,283 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer  - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2017-06-27 09:49:57,417 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
2017-06-27 09:49:57,683 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler  - File concatenation threshold: 100 optimistic? false
2017-06-27 09:49:57,751 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1
2017-06-27 09:49:57,751 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1
2017-06-27 09:49:57,941 [main] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
2017-06-27 09:49:58,084 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  - Pig script settings are added to the job
2017-06-27 09:49:58,180 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2017-06-27 09:49:58,181 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-06-27 09:49:58,181 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2017-06-27 09:49:58,188 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - creating jar file Job7371613025379832997.jar
2017-06-27 09:50:04,390 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - jar file Job7371613025379832997.jar created
2017-06-27 09:50:04,390 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-06-27 09:50:04,443 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job
2017-06-27 09:50:04,473 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Key [pig.schematuple] is false, will not generate code.
2017-06-27 09:50:04,474 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Starting process to move generated code to distributed cache
2017-06-27 09:50:04,476 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Setting key [pig.schematuple.classes] with classes to deserialize []
2017-06-27 09:50:04,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.
2017-06-27 09:50:04,604 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2017-06-27 09:50:04,641 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
2017-06-27 09:50:04,717 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation  - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-06-27 09:50:05,784 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat  - Total input paths to process : 1
2017-06-27 09:50:05,785 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
2017-06-27 09:50:05,824 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths (combined) to process : 1
2017-06-27 09:50:05,891 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - number of splits:1
2017-06-27 09:50:06,130 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Submitting tokens for job: job_1498580068153_0006
2017-06-27 09:50:06,131 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Kind: mapreduce.job, Service: job_1498580068153_0005, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@36358f96)
2017-06-27 09:50:06,131 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Kind: RM_DELEGATION_TOKEN, Service: 127.0.0.1:8032, Ident: (owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1498582168340, maxDate=1499186968340, sequenceNumber=13, masterKeyId=2)
2017-06-27 09:50:07,113 [JobControl] WARN  org.apache.hadoop.mapreduce.v2.util.MRApps  - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20160810143758/pig/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20160810143758/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
2017-06-27 09:50:07,236 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1498580068153_0006
2017-06-27 09:50:07,334 [JobControl] INFO  org.apache.hadoop.mapreduce.Job  - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1498580068153_0006/
2017-06-27 09:50:07,334 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - HadoopJobId: job_1498580068153_0006
2017-06-27 09:50:07,334 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Processing aliases master1
2017-06-27 09:50:07,334 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - detailed locations: M: master1[11,10],master1[-1,-1] C:  R: 
2017-06-27 09:50:07,335 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - More information at: http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_1498580068153_0006
2017-06-27 09:50:07,465 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete
Heart beat
2017-06-27 09:50:25,887 [Service Thread] INFO  org.apache.pig.impl.util.SpillableMemoryManager  - first memory handler call- Usage threshold init = 34603008(33792K) used = 24342512(23771K) committed = 34603008(33792K) max = 34603008(33792K)
Heart beat
Heart beat
Heart beat
Heart beat

and....so on.....

As you can see in my workflow.xml, i've tried to set mapred parameter for MAP and Reduce to 4, hoping this make my job work...

 

BUT  NOT...

 

Have i missed something ???

 

bloodie OOZIE !!!

 

;-

 

thanks in advance for help

 

 

 

 

 

 

 

 

 

 

 

 

Highlighted
New Contributor
Posts: 6
Registered: ‎06-06-2017

Re: oozie pig heart beat

Hello Again,

For people having the save problem than me with a pseudo distributed version...

 

1) You must install CLOUDERA EXPRESS

 

2) I ve set YARN parameter : yarn.nodemanager.res

 

Container Memory has been set from 3 Gb to 8Gb

 

Container virtual cpu cores : from 3 to 8

 

and it s work....

 

miracle...

 if somedy has an explanation, he (she) is welcome...

 

;-)

Announcements