Reply
New Contributor
Posts: 2
Registered: ‎12-20-2017

Issue when running MapReduce in the VM with Pig - 0% progress and stuck

[ Edited ]

Hi I am consistently getting an issue when running the jobs from HUE/Pig.

 

I have reduced the script I am trying to run to something really small and simple:

 

mytags = LOAD 'test.csv' USING PigStorage();

describe mytags;

dump mytags;

 

The test.csv file contains just a small number of lines.

 

Here are the logs from the job when it is running - it just continues in this stuck state:

 

Run pig script using PigRunner.run() for Pig version 0.8+
Apache Pig version 0.12.0-cdh5.12.0 (rexported) 
compiled Jun 29 2017, 04:34:31

Run pig script using PigRunner.run() for Pig version 0.8+
2017-12-29 02:18:42,601 [main] INFO  org.apache.pig.Main  - Apache Pig version 0.12.0-cdh5.12.0 (rexported) compiled Jun 29 2017, 04:34:31
2017-12-29 02:18:42,602 [main] INFO  org.apache.pig.Main  - Logging error messages to: /yarn/nm/usercache/cloudera/appcache/application_1513869977223_0016/container_1513869977223_0016_01_000002/pig-job_1513869977223_0016.log
2017-12-29 02:18:42,649 [main] INFO  org.apache.pig.impl.util.Utils  - Default bootup file /var/lib/hadoop-yarn/.pigbootup not found
2017-12-29 02:18:42,755 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2017-12-29 02:18:42,755 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-12-29 02:18:42,755 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: hdfs://quickstart.cloudera:8020
2017-12-29 02:18:42,760 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to map-reduce job tracker at: quickstart.cloudera:8032
2017-12-29 02:18:42,762 [main] WARN  org.apache.pig.PigServer  - Empty string specified for jar path
Schema for mytags unknown.
2017-12-29 02:18:43,337 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  - Pig features used in the script: UNKNOWN
2017-12-29 02:18:43,372 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer  - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2017-12-29 02:18:43,442 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler  - File concatenation threshold: 100 optimistic? false
2017-12-29 02:18:43,486 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1
2017-12-29 02:18:43,486 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1
2017-12-29 02:18:43,569 [main] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032
2017-12-29 02:18:43,598 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  - Pig script settings are added to the job
2017-12-29 02:18:43,628 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2017-12-29 02:18:43,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2017-12-29 02:18:43,628 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2017-12-29 02:18:43,632 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - creating jar file Job8033199913034010101.jar
2017-12-29 02:18:46,115 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - jar file Job8033199913034010101.jar created
2017-12-29 02:18:46,116 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-12-29 02:18:46,141 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job
2017-12-29 02:18:46,184 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.
2017-12-29 02:18:46,185 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation  - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2017-12-29 02:18:46,193 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy  - Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032
2017-12-29 02:18:46,225 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation  - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-12-29 02:18:46,508 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat  - Total input paths to process : 1
2017-12-29 02:18:46,508 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
2017-12-29 02:18:46,526 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths (combined) to process : 1
2017-12-29 02:18:46,564 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - number of splits:1
2017-12-29 02:18:46,626 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Submitting tokens for job: job_1513869977223_0017
2017-12-29 02:18:46,626 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Kind: mapreduce.job, Service: job_1513869977223_0016, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@39cee17e)
2017-12-29 02:18:46,626 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter  - Kind: RM_DELEGATION_TOKEN, Service: 10.0.2.15:8032, Ident: (RM_DELEGATION_TOKEN owner=cloudera, renewer=oozie mr token, realUser=oozie, issueDate=1514542711324, maxDate=1515147511324, sequenceNumber=39, masterKeyId=3)
2017-12-29 02:18:47,068 [JobControl] WARN  org.apache.hadoop.mapreduce.v2.util.MRApps  - cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20170719053712/pig/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://quickstart.cloudera:8020/user/oozie/share/lib/lib_20170719053712/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
2017-12-29 02:18:47,124 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1513869977223_0017
2017-12-29 02:18:47,166 [JobControl] INFO  org.apache.hadoop.mapreduce.Job  - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1513869977223_0017/
2017-12-29 02:18:47,166 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - HadoopJobId: job_1513869977223_0017
2017-12-29 02:18:47,166 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Processing aliases mytags
2017-12-29 02:18:47,166 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - detailed locations: M: mytags[1,9] C:  R: 
2017-12-29 02:18:47,166 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - More information at: http://quickstart.cloudera:50030/jobdetails.jsp?jobid=job_1513869977223_0017
2017-12-29 02:18:47,202 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete
Heart beat
Heart beat
Heart beat

 

I am suspecting that there is something not right on the VM config by default.  But I am not sure where to start on this?  My general VM has 12GB memory and 4 CPU's assigned within Virtualbox.

 

Could it be a config that needs to change in the VM image (maybe from the Cloudera Manager)?  If that is the case then there might need to be a default change to the VM image?

 

Has anybody managed to get the MapReduce from within HUE/Pig to run 'out of the box'?

 

From within Cloudera Manager I can see that all services except HBase are running and green.

 

Many thanks.

Announcements