I have some data files in my Hortonworks HDFS location. My requirement is to dump HDFS data in pig shell using pig-mapreduce mode. After loading the file data from HDFS, when trying to dump the data in pig shell using DUMP command, the map reduce job is getting stuck at 0% and not completing the job as well for a long time.
I have 3 ec2 instances and each of them having 4core cpu and 16GB of total Main Memory and below are the configurations for mapred-site and yarn-site.xml.
Followed the given below steps: 1) Start pig on mapreduce mode: pig -x mapreduce
2) Load data into pig from a HDFS directory: mapdata = load 'hdfs://ip-xxx-xx-xx-xx.us-east-2.compute.internal:8020/user/abc/datadir1' as (a:map[chararray]);
3) Print data: dump mapdata;
After executing the 3rd step getting given below messages on the shell:
2018-10-09 07:25:51,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2018-10-09 07:25:51,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1539066382468_0147]