Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

PigStorage in mapreduce mode

Solved Go to solution

PigStorage in mapreduce mode

Expert Contributor

Hi,

I am trying to execute pig script in mapreduce mode, script is simple:

grunt> sourceData = load 'hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);

File is stored in HDFS:

hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv

-rw-r--r--  3 hdfs hdfs  6828 2016-02-04 23:55 hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv

Error that i got:

Failed Jobs: JobId Alias Feature Message Outputs job_1454609613558_0003 sourceData MAP_ONLY Message: Job failed! hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262,

Input(s): Failed to read data from "hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv"

Output(s): Failed to produce result in "hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262"

Pig Stack Trace---------------ERROR 1066: Unable to open iterator for alias sourceDataorg.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sourceData  at org.apache.pig.PigServer.openIterator(PigServer.java:935)  at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)  at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)  at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)  at org.apache.pig.Main.run(Main.java:565)  at org.apache.pig.Main.main(Main.java:177)  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.lang.reflect.Method.invoke(Method.java:606)  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)Caused by: java.io.IOException: Job terminated with anomalous status FAILED  at org.apache.pig.PigServer.openIterator(PigServer.java:927)  ... 13 more
1 ACCEPTED SOLUTION

Accepted Solutions

Re: PigStorage in mapreduce mode

New Contributor

First, the error stack does not tell much. You will need to go to MapReduce WebUI, click the job and find the real error message. Second, your input is a csv file, and you use ; as delimit for PigStorage, that sounds wrong unless you are sure that's the case.

17 REPLIES 17

Re: PigStorage in mapreduce mode

Mentor

@John Smith

How did you launch Pig Grunt?

On sandbox you can use tez

pig -x tez

You can refer to your dataset like so:

'/src/filename.csv' you don't need to explicitly set hdfs scheme. Also, make sure the src directory has permissions for the user you are executing with. Also, last time I looked at your dataset, I thought the delimeter was comma and not semicolon.

Re: PigStorage in mapreduce mode

Expert Contributor

needles to say, this is insane.

Yes, grunt by -x mapreduce, i tried -x tez but:

2016-02-05 00:37:42,172 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sourceDataDetails at logfile: /home/hdfs/pig_1454632554431.log
privileges are correct:

drwxr-xr-x   - hdfs   hdfs            0 2016-02-04 23:55 /src

delimiter is is ;

any idea?

Re: PigStorage in mapreduce mode

Mentor

@John Smith upload your dataset I will try it out in a bit.

Re: PigStorage in mapreduce mode

Expert Contributor
Highlighted

Re: PigStorage in mapreduce mode

New Contributor

I got the same issue in hortonworks sandbox environment. Script was correct but was throwing this error

Unable to open iterator foralias

I found Jobhistory server was not working by default. I could not relate the connection between the two but after starting histoyserver , my pig script worked in both tez and mapreduce mode. Try it if it works for yoou as well.

[mapred@sandbox ~]$ cd /usr/hdp/current/hadoop-mapreduce-historyserver/sbin
[mapred@sandbox sbin]$ ls 
mr-jobhistory-daemon.sh
[mapred@sandbox sbin]$ mr-jobhistory-daemon.sh start historyserver

Re: PigStorage in mapreduce mode

New Contributor

I run successfully with your load statement follow by a dump on 2.3 sandbox. What's your complete script?

Re: PigStorage in mapreduce mode

Mentor
@John Smith

something is wrong with your environment, I was able to execute your statements in mapred and tez modes

grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:43:32,930 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-02-05 02:43:33,105 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,106 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:43:33,179 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:43:33,179 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:43:33,209 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:43:33,256 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,257 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:43:33,305 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp2063345867/tmp1865027526/_temporary/0/task__0001_m_000001
2016-02-05 02:43:33,333 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,336 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,336 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

and output of illustrate in mapred mode

("Leonel Bullen",50258523,"WBA23530058599244","1986-08-26",27,8.673)
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,394 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,401 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,452 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,459 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,459 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,473 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,473 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,520 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,538 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,545 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,545 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,589 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,606 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,606 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,612 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,612 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,613 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,613 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,665 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,668 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| sourceData     | nullname:chararray     | customerId:chararray     | VIN:chararray       | Birthdate:chararray     | Mileage:chararray     | Fuel_Consumption:chararray     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                | "Leonel Bullen"        | 50258523                 | "WBA23530058599244" | "1986-08-26"            | 27                    | 8.673                          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

and in tez mode

grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:46:17,619 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-02-05 02:46:17,698 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:17,749 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:46:18,039 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:46:18,039 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:46:18,143 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:18,274 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,288 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:46:18,711 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp-785652652/tmp136925164/_temporary/0/task__0001_m_000001
2016-02-05 02:46:18,782 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:46:18,811 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,811 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

there was a bug where illustrate command doesn't work in tez mode yet.

bottom line, I tested it with 'CustomerData.csv' also with '/user/root/CustomerData.csv' and 'hdfs://fqdn:8020/user/root/CustomerData.csv'.

Re: PigStorage in mapreduce mode

New Contributor

First, the error stack does not tell much. You will need to go to MapReduce WebUI, click the job and find the real error message. Second, your input is a csv file, and you use ; as delimit for PigStorage, that sounds wrong unless you are sure that's the case.

Re: PigStorage in mapreduce mode

Expert Contributor

then what kind of issue with environment it could be?

I only executed menitoned command, nothing else.