Created 02-05-2016 12:13 AM
Hi,
I am trying to execute pig script in mapreduce mode, script is simple:
grunt> sourceData = load 'hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv' using PigStorage(';') as (nullname: chararray,customerId: chararray,VIN: chararray,Birthdate: chararray,Mileage: chararray,Fuel_Consumption: chararray);
File is stored in HDFS:
hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv -rw-r--r-- 3 hdfs hdfs 6828 2016-02-04 23:55 hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv
Error that i got:
Failed Jobs: JobId Alias Feature Message Outputs job_1454609613558_0003 sourceData MAP_ONLY Message: Job failed! hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262,
Input(s): Failed to read data from "hdfs://sandbox.hortonworks.com:8020/src/CustomerData.csv"
Output(s): Failed to produce result in "hdfs://sandbox.hortonworks.com:8020/tmp/temp-710368608/tmp-1611282262"
Pig Stack Trace---------------ERROR 1066: Unable to open iterator for alias sourceDataorg.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias sourceData at org.apache.pig.PigServer.openIterator(PigServer.java:935) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:565) at org.apache.pig.Main.main(Main.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:927) ... 13 more
Created 02-05-2016 05:15 AM
First, the error stack does not tell much. You will need to go to MapReduce WebUI, click the job and find the real error message. Second, your input is a csv file, and you use ; as delimit for PigStorage, that sounds wrong unless you are sure that's the case.
Created 02-05-2016 09:03 AM
this is odd:
when i do
i works for me also, when i dont limit result set .. .and just executing dump sourceData; im occurring same error.
Created 02-05-2016 11:38 AM
I think it crashed on me when I dumped the whole dataset, there might be a problem with your dataset further down. @John Smith
Created 02-05-2016 02:45 PM
for 100% there is no problem with input dataset, i kept only first 5 records in file and its the same issue.
Created 02-05-2016 03:04 PM
@John Smith you got me there, as you see my attempt with your file worked. Alternatively take a look at CSVExcelStorage as that has more capability as opposed to PigStorage. link
I am not saying this is the case, I don't know what's wrong but here's a note, not sure how valid it is anymore as this note has been around for a while and they don't mention which version of Pig they were using
PigStorage is an extremely simple loader that does not handle special cases such as embedded delimiters or escaped control characters; it will split on every instance of the delimiter regardless of context. For this reason, when loading a CSV file it is recommended to use CSVExcelStorage rather than PigStorage with a comma delimiter.
Created 02-05-2016 04:01 PM
well CSVExcelStorage doesnt work also....
2016-02-05 16:01:28,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2016-02-05 16:01:29,745 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias sourceData Details at logfile: /home/hdfs/pig_1454687855333.log grunt>
Im confused... what is it.
Created 02-05-2016 04:17 PM
@John Smith if you identified another bug, I'm going to buy a lottery ticket.
Created 02-05-2016 06:53 PM
As I commented above. I cannot reproduce the error. The error you posted is too general. Can you go to Hadoop Web UI and get the detailed message?
Created 02-09-2016 01:37 PM
its strange you cant reproduce error, does it work for you?
Application application_1454923438220_0007 failed 2 times due to AM Container for appattempt_1454923438220_0007_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://sandbox.hortonworks.com:8088/cluster/app/application_1454923438220_0007Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e10_1454923438220_0007_02_000001 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:576) at org.apache.hadoop.util.Shell.run(Shell.java:487) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.