Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

PigStorage in mapreduce mode

Expert Contributor


I am trying to execute pig script in mapreduce mode, script is simple:

grunt> sourceData = load 'hdfs://' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);

File is stored in HDFS:

hadoop fs -ls hdfs://

-rw-r--r--  3 hdfs hdfs  6828 2016-02-04 23:55 hdfs://

Error that i got:

Failed Jobs: JobId Alias Feature Message Outputs job_1454609613558_0003 sourceData MAP_ONLY Message: Job failed! hdfs://,

Input(s): Failed to read data from "hdfs://"

Output(s): Failed to produce result in "hdfs://"

Pig Stack Trace---------------ERROR 1066: Unable to open iterator for alias sourceDataorg.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sourceData  at org.apache.pig.PigServer.openIterator(  at  at  at  at  at  at  at org.apache.pig.Main.main(  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at sun.reflect.NativeMethodAccessorImpl.invoke(  at sun.reflect.DelegatingMethodAccessorImpl.invoke(  at java.lang.reflect.Method.invoke(  at  at org.apache.hadoop.util.RunJar.main( by: Job terminated with anomalous status FAILED  at org.apache.pig.PigServer.openIterator(  ... 13 more


This problem has been solved!

Want to get a detailed solution you have to login/registered on the community


Master Mentor

@John Smith

How did you launch Pig Grunt?

On sandbox you can use tez

pig -x tez

You can refer to your dataset like so:

'/src/filename.csv' you don't need to explicitly set hdfs scheme. Also, make sure the src directory has permissions for the user you are executing with. Also, last time I looked at your dataset, I thought the delimeter was comma and not semicolon.

Expert Contributor

needles to say, this is insane.

Yes, grunt by -x mapreduce, i tried -x tez but:

2016-02-05 00:37:42,172 [main] ERROR - ERROR 1066: Unable to open iterator for alias sourceDataDetails at logfile: /home/hdfs/pig_1454632554431.log
privileges are correct:

drwxr-xr-x   - hdfs   hdfs            0 2016-02-04 23:55 /src

delimiter is is ;

any idea?

Master Mentor

@John Smith upload your dataset I will try it out in a bit.

Expert Contributor


I got the same issue in hortonworks sandbox environment. Script was correct but was throwing this error

Unable to open iterator foralias

I found Jobhistory server was not working by default. I could not relate the connection between the two but after starting histoyserver , my pig script worked in both tez and mapreduce mode. Try it if it works for yoou as well.

[mapred@sandbox ~]$ cd /usr/hdp/current/hadoop-mapreduce-historyserver/sbin
[mapred@sandbox sbin]$ ls
[mapred@sandbox sbin]$ start historyserver


I run successfully with your load statement follow by a dump on 2.3 sandbox. What's your complete script?

Master Mentor
@John Smith

something is wrong with your environment, I was able to execute your statements in mapred and tez modes

grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:43:32,930 [main] INFO - Pig features used in the script: LIMIT
2016-02-05 02:43:33,105 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,106 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:43:33,179 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:43:33,179 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:43:33,209 [main] INFO - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:43:33,256 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,257 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:43:33,305 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://
2016-02-05 02:43:33,333 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,336 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,336 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

and output of illustrate in mapred mode

("Leonel Bullen",50258523,"WBA23530058599244","1986-08-26",27,8.673)
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,394 [main] INFO - Pig script settings are added to the job
2016-02-05 02:43:47,401 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,452 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,459 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,459 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,473 [main] INFO - Pig script settings are added to the job
2016-02-05 02:43:47,473 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,520 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,538 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,545 [main] INFO - Pig script settings are added to the job
2016-02-05 02:43:47,545 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,589 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,606 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,606 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,612 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,612 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,613 [main] INFO - Pig script settings are added to the job
2016-02-05 02:43:47,613 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,665 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,668 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
| sourceData     | nullname:chararray     | customerId:chararray     | VIN:chararray       | Birthdate:chararray     | Mileage:chararray     | Fuel_Consumption:chararray     |
|                | "Leonel Bullen"        | 50258523                 | "WBA23530058599244" | "1986-08-26"            | 27                    | 8.673                          |

and in tez mode

grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:46:17,619 [main] INFO - Pig features used in the script: LIMIT
2016-02-05 02:46:17,698 [main] INFO - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:17,749 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:46:18,039 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:46:18,039 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:46:18,143 [main] INFO - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:18,274 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,288 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:46:18,711 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://
2016-02-05 02:46:18,782 [main] WARN - SchemaTupleBackend has already been initialized
2016-02-05 02:46:18,811 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,811 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

there was a bug where illustrate command doesn't work in tez mode yet.

bottom line, I tested it with 'CustomerData.csv' also with '/user/root/CustomerData.csv' and 'hdfs://fqdn:8020/user/root/CustomerData.csv'.


This problem has been solved!

Want to get a detailed solution you have to login/registered on the community


Expert Contributor

then what kind of issue with environment it could be?

I only executed menitoned command, nothing else.