Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PigStorage in mapreduce mode

avatar
Expert Contributor

Hi,

I am trying to execute pig script in mapreduce mode, script is simple:

grunt> sourceData = load 'hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);

File is stored in HDFS:

hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv

-rw-r--r--  3 hdfs hdfs  6828 2016-02-04 23:55 hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv

Error that i got:

Failed Jobs: JobId Alias Feature Message Outputs job_1454609613558_0003 sourceData MAP_ONLY Message: Job failed! hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262,

Input(s): Failed to read data from "hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv"

Output(s): Failed to produce result in "hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262"

Pig Stack Trace---------------ERROR 1066: Unable to open iterator for alias sourceDataorg.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sourceData  at org.apache.pig.PigServer.openIterator(PigServer.java:935)  at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)  at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)  at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)  at org.apache.pig.Main.run(Main.java:565)  at org.apache.pig.Main.main(Main.java:177)  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  at java.lang.reflect.Method.invoke(Method.java:606)  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)Caused by: java.io.IOException: Job terminated with anomalous status FAILED  at org.apache.pig.PigServer.openIterator(PigServer.java:927)  ... 13 more
1 ACCEPTED SOLUTION

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
17 REPLIES 17

avatar
Master Mentor

@John Smith

How did you launch Pig Grunt?

On sandbox you can use tez

pig -x tez

You can refer to your dataset like so:

'/src/filename.csv' you don't need to explicitly set hdfs scheme. Also, make sure the src directory has permissions for the user you are executing with. Also, last time I looked at your dataset, I thought the delimeter was comma and not semicolon.

avatar
Expert Contributor

needles to say, this is insane.

Yes, grunt by -x mapreduce, i tried -x tez but:

2016-02-05 00:37:42,172 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sourceDataDetails at logfile: /home/hdfs/pig_1454632554431.log
privileges are correct:

drwxr-xr-x   - hdfs   hdfs            0 2016-02-04 23:55 /src

delimiter is is ;

any idea?

avatar
Master Mentor

@John Smith upload your dataset I will try it out in a bit.

avatar
Expert Contributor

avatar
Contributor

I got the same issue in hortonworks sandbox environment. Script was correct but was throwing this error

Unable to open iterator foralias

I found Jobhistory server was not working by default. I could not relate the connection between the two but after starting histoyserver , my pig script worked in both tez and mapreduce mode. Try it if it works for yoou as well.

[mapred@sandbox ~]$ cd /usr/hdp/current/hadoop-mapreduce-historyserver/sbin
[mapred@sandbox sbin]$ ls 
mr-jobhistory-daemon.sh
[mapred@sandbox sbin]$ mr-jobhistory-daemon.sh start historyserver

avatar
Explorer

I run successfully with your load statement follow by a dump on 2.3 sandbox. What's your complete script?

avatar
Master Mentor
@John Smith

something is wrong with your environment, I was able to execute your statements in mapred and tez modes

grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:43:32,930 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-02-05 02:43:33,105 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,106 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:43:33,179 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:43:33,179 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:43:33,209 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:43:33,256 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,257 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:43:33,305 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp2063345867/tmp1865027526/_temporary/0/task__0001_m_000001
2016-02-05 02:43:33,333 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:33,336 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:43:33,336 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

and output of illustrate in mapred mode

("Leonel Bullen",50258523,"WBA23530058599244","1986-08-26",27,8.673)
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,393 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,394 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,401 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,452 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,459 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,459 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,473 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,473 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,520 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,538 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,542 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,545 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,545 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,589 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,606 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
2016-02-05 02:43:47,606 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-05 02:43:47,612 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-05 02:43:47,612 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-05 02:43:47,613 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-02-05 02:43:47,613 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-05 02:43:47,665 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:43:47,668 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C:  R:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| sourceData     | nullname:chararray     | customerId:chararray     | VIN:chararray       | Birthdate:chararray     | Mileage:chararray     | Fuel_Consumption:chararray     |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                | "Leonel Bullen"        | 50258523                 | "WBA23530058599244" | "1986-08-26"            | 27                    | 8.673                          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

and in tez mode

grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
grunt> describe sourceData;
sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray}
grunt> b = limit sourceData 5;
grunt> dump b;
2016-02-05 02:46:17,619 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-02-05 02:46:17,698 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:17,749 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-02-05 02:46:18,039 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1
2016-02-05 02:46:18,039 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2016-02-05 02:46:18,143 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-05 02:46:18,274 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,288 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-05 02:46:18,711 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp-785652652/tmp136925164/_temporary/0/task__0001_m_000001
2016-02-05 02:46:18,782 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-02-05 02:46:18,811 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-05 02:46:18,811 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26)
("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551)
("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779)
("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109)
("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)

there was a bug where illustrate command doesn't work in tez mode yet.

bottom line, I tested it with 'CustomerData.csv' also with '/user/root/CustomerData.csv' and 'hdfs://fqdn:8020/user/root/CustomerData.csv'.

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

then what kind of issue with environment it could be?

I only executed menitoned command, nothing else.