Created 02-05-2016 12:13 AM
Hi,
I am trying to execute pig script in mapreduce mode, script is simple:
grunt> sourceData = load 'hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
File is stored in HDFS:
hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv -rw-r--r-- 3 hdfs hdfs 6828 2016-02-04 23:55 hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv
Error that i got:
Failed Jobs: JobId Alias Feature Message Outputs job_1454609613558_0003 sourceData MAP_ONLY Message: Job failed! hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262,
Input(s): Failed to read data from "hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv"
Output(s): Failed to produce result in "hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262"
Pig Stack Trace---------------ERROR 1066: Unable to open iterator for alias sourceDataorg.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sourceData at org.apache.pig.PigServer.openIterator(PigServer.java:935) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:565) at org.apache.pig.Main.main(Main.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:927) ... 13 more
Created 02-05-2016 05:15 AM
First, the error stack does not tell much. You will need to go to MapReduce WebUI, click the job and find the real error message. Second, your input is a csv file, and you use ; as delimit for PigStorage, that sounds wrong unless you are sure that's the case.
Created 02-05-2016 12:23 AM
How did you launch Pig Grunt?
On sandbox you can use tez
pig -x tez
You can refer to your dataset like so:
'/src/filename.csv' you don't need to explicitly set hdfs scheme. Also, make sure the src directory has permissions for the user you are executing with. Also, last time I looked at your dataset, I thought the delimeter was comma and not semicolon.
Created 02-05-2016 12:39 AM
needles to say, this is insane.
Yes, grunt by -x mapreduce, i tried -x tez but:
2016-02-05 00:37:42,172 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sourceDataDetails at logfile: /home/hdfs/pig_1454632554431.log
privileges are correct: drwxr-xr-x - hdfs hdfs 0 2016-02-04 23:55 /src
delimiter is is ;
any idea?
Created 02-05-2016 12:48 AM
@John Smith upload your dataset I will try it out in a bit.
Created 02-05-2016 12:59 AM
you can find dataset here:
https://drive.google.com/file/d/0B6RZ_9vVuTEcTHllU1dIR2VBY1E/view?usp=sharing
\\thank you
Created 04-01-2017 01:40 AM
I got the same issue in hortonworks sandbox environment. Script was correct but was throwing this error
Unable to open iterator foralias
I found Jobhistory server was not working by default. I could not relate the connection between the two but after starting histoyserver , my pig script worked in both tez and mapreduce mode. Try it if it works for yoou as well.
[mapred@sandbox ~]$ cd /usr/hdp/current/hadoop-mapreduce-historyserver/sbin [mapred@sandbox sbin]$ ls mr-jobhistory-daemon.sh [mapred@sandbox sbin]$ mr-jobhistory-daemon.sh start historyserver
Created 02-05-2016 01:46 AM
I run successfully with your load statement follow by a dump on 2.3 sandbox. What's your complete script?
Created 02-05-2016 02:49 AM
something is wrong with your environment, I was able to execute your statements in mapred and tez modes
grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray); grunt> describe sourceData; sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray} grunt> b = limit sourceData 5; grunt> dump b; 2016-02-05 02:43:32,930 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT 2016-02-05 02:43:33,105 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:33,106 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2016-02-05 02:43:33,179 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-02-05 02:43:33,179 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-02-05 02:43:33,209 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-02-05 02:43:33,256 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:43:33,257 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2016-02-05 02:43:33,305 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp2063345867/tmp1865027526/_temporary/0/task__0001_m_000001 2016-02-05 02:43:33,333 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:33,336 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:43:33,336 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 ("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26) ("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551) ("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779) ("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109) ("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)
and output of illustrate in mapred mode
("Leonel Bullen",50258523,"WBA23530058599244","1986-08-26",27,8.673) 2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,394 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,401 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,452 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: 2016-02-05 02:43:47,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,470 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,473 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,473 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,520 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: 2016-02-05 02:43:47,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,545 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,545 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,589 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: 2016-02-05 02:43:47,606 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2016-02-05 02:43:47,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-05 02:43:47,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2016-02-05 02:43:47,613 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2016-02-05 02:43:47,613 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2016-02-05 02:43:47,665 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:43:47,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: sourceData[4,13] C: R: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | sourceData | nullname:chararray | customerId:chararray | VIN:chararray | Birthdate:chararray | Mileage:chararray | Fuel_Consumption:chararray | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | "Leonel Bullen" | 50258523 | "WBA23530058599244" | "1986-08-26" | 27 | 8.673 | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
and in tez mode
grunt> sourceData = load 'CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray); grunt> describe sourceData; sourceData: {nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray} grunt> b = limit sourceData 5; grunt> dump b; 2016-02-05 02:46:17,619 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT 2016-02-05 02:46:17,698 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-02-05 02:46:17,749 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2016-02-05 02:46:18,039 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - File Output Committer Algorithm version is 1 2016-02-05 02:46:18,039 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2016-02-05 02:46:18,143 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-02-05 02:46:18,274 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:46:18,288 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2016-02-05 02:46:18,711 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt__0001_m_000001_1' to hdfs://c6401.ambari.apache.org:8020/tmp/temp-785652652/tmp136925164/_temporary/0/task__0001_m_000001 2016-02-05 02:46:18,782 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2016-02-05 02:46:18,811 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-05 02:46:18,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 ("Ronni Engelmann",93117643,"WBA68251082969954","1971-11-15",41,10.26) ("Kina Buttars",12452346,"WBA32649710927373","1968-08-14",68,10.551) ("Caren Rodman",18853438,"WBA56064572124841","1987-01-24",96,6.779) ("Tierra Bork",89673290,"WBA69315467645466","1958-11-22",52,10.109) ("Thelma Steve",97170856,"WBA73739033913927","1985-12-03",98,5.081)
there was a bug where illustrate command doesn't work in tez mode yet.
bottom line, I tested it with 'CustomerData.csv' also with '/user/root/CustomerData.csv' and 'hdfs://fqdn:8020/user/root/CustomerData.csv'.
Created 02-05-2016 05:15 AM
First, the error stack does not tell much. You will need to go to MapReduce WebUI, click the job and find the real error message. Second, your input is a csv file, and you use ; as delimit for PigStorage, that sounds wrong unless you are sure that's the case.
Created 02-05-2016 08:43 AM
then what kind of issue with environment it could be?
I only executed menitoned command, nothing else.