Created 12-08-2016 05:36 PM
I can't run pig on Hortonworks Sandbox in Azure. Keep getting this errors.
Input(s): Failed to read data from "hdfs://sanbox.hortonworks.com:8020/tmp/demo/data/drivers.tsv"
Any ideas what the issue could be? I can't seem to get this to work.
Created 12-08-2016 07:08 PM
@Fru N. you are trying to load /tmp/demo/piglab04/data/drivers.tsv but your file has a csv extension. Change from tsv to csv. It should work.
Created 12-08-2016 05:39 PM
@Fru N. can you check whether the file is present at the specified location, and if yes, what are the permissions for that file?
Created 12-08-2016 05:59 PM
Yes.. I checked. And file is present. I gave 777 permission.
SInce it's Azure, not sure if there are additional settings in need to change with my hdfs-site.xml
Created 12-08-2016 06:02 PM
@Fru N., you do not have to change anything in hdfs-site.xml. Can you please send entire job log.
Created 12-08-2016 06:46 PM
grunt> drivers = LOAD '/tmp/demo/piglab04/data/drivers.tsv' USING PigStorage(',');
grunt> dump drivers; 2016-12-08 18:44:08,284 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2016-12-08 18:44:08,386 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-12-08 18:44:08,451 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstPara llelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, St reamTypeCastInserter]} 2016-12-08 18:44:08,611 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/fnhdp/staging and resources directory is /tmp/temp101 1891030 2016-12-08 18:44:08,659 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:412) at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:291) at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:171) at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:183) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:249) at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:55) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:461) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:171) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304) at org.apache.pig.PigServer.launchPlan(PigServer.java:1431) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416) at org.apache.pig.PigServer.storeEx(PigServer.java:1075) at org.apache.pig.PigServer.store(PigServer.java:1038) at org.apache.pig.PigServer.openIterator(PigServer.java:951) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:565) at org.apache.pig.Main.main(Main.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) ... 29 more 2016-12-08 18:44:08,981 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv
Details at logfile: /home/fnhdp/pig_1481222529518.log
Created 12-08-2016 06:48 PM
@Mushtaq Rizvi -Note: The file does exists. I have confirmed that.
Causedby: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv
Created 12-08-2016 06:50 PM
it says that the input path does not exist, can you also show the output of this command:
hdfs dfs -ls /tmp/demo/piglab04/data/
Created 12-08-2016 06:58 PM
hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data Found 3 items -rwxrwxrwx 3 maria_dev hdfs 2043 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.csv -rwxrwxrwx 3 maria_dev hdfs 26205 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/timesheet.csv -rwxrwxrwx 3 maria_dev hdfs 2272077 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/truck_event_text_partition.csv
Created 12-08-2016 07:00 PM
hdfs dfs -ls /tmp/demo/piglab04/data/ Found 3 items -rwxrwxrwx 3 maria_dev hdfs 2043 2016-12-08 16:05 /tmp/demo/piglab04/data/drivers.csv -rwxrwxrwx 3 maria_dev hdfs 26205 2016-12-08 16:05 /tmp/demo/piglab04/data/timesheet.csv -rwxrwxrwx 3 maria_dev hdfs 2272077 2016-12-08 16:05 /tmp/demo/piglab04/data/truck_event_text_partition.csv
Created 12-08-2016 07:07 PM
For reference, i'm using the Microsoft Azure: Hortonworks Sandbox with HDP 2.4