Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig on Hortonworks Sandbox In Azure

avatar
Explorer

I can't run pig on Hortonworks Sandbox in Azure. Keep getting this errors.

Input(s):                                                                                                                                            
Failed to read data from "hdfs://sanbox.hortonworks.com:8020/tmp/demo/data/drivers.tsv" 

Any ideas what the issue could be? I can't seem to get this to work.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Fru N. you are trying to load /tmp/demo/piglab04/data/drivers.tsv but your file has a csv extension. Change from tsv to csv. It should work.

View solution in original post

13 REPLIES 13

avatar
Super Collaborator

@Fru N. can you check whether the file is present at the specified location, and if yes, what are the permissions for that file?

avatar
Explorer

Yes.. I checked. And file is present. I gave 777 permission.

SInce it's Azure, not sure if there are additional settings in need to change with my hdfs-site.xml

avatar
Super Collaborator

@Fru N., you do not have to change anything in hdfs-site.xml. Can you please send entire job log.

avatar
Explorer

@Mushtaq Rizvi

grunt> drivers = LOAD '/tmp/demo/piglab04/data/drivers.tsv' USING PigStorage(',');

grunt> dump drivers;                                                                                                                                                                    
2016-12-08 18:44:08,284 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN                                                               
2016-12-08 18:44:08,386 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.                                              
2016-12-08 18:44:08,451 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstPara
llelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, St
reamTypeCastInserter]}                                                                                                                                                                  
2016-12-08 18:44:08,611 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/fnhdp/staging and resources directory is /tmp/temp101
1891030                                                                                                                                                                                 
2016-12-08 18:44:08,659 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false                           
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv                     
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)                                                               
        at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:412)                                                                                    
        at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:291)                                                                             
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:171)                                                      
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:183)                                                        
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:249)                                                                               
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:55)                                                                                
        at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)                                                                                           
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)                                                                                                              
        at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:461)                                                                
        at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:171)                                                                                
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304)                                                                          
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)                                                                                                                     
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)                                                                                                     
        at org.apache.pig.PigServer.storeEx(PigServer.java:1075)                                                                                                                        
        at org.apache.pig.PigServer.store(PigServer.java:1038)                                                                                                                          
        at org.apache.pig.PigServer.openIterator(PigServer.java:951)                                                                                                                    
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)                                                                                                     
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)                                                                                        
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)                                                                                                
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)                                                                                                
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)                                                                                                                          
        at org.apache.pig.Main.run(Main.java:565)                                                                                                                                       
        at org.apache.pig.Main.main(Main.java:177)                                                                                                                                      
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                                  
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)                                                                                                
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                                        
        at java.lang.reflect.Method.invoke(Method.java:606)                                                                                                                             
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)                                                                                                                           
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)                                                                                                                          
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv               
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)                                                                     
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)                                                                                   
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)                                                       
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)                                                                                    
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)                                                               
        ... 29 more                                                                                                                                                                     
2016-12-08 18:44:08,981 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv    

Details at logfile: /home/fnhdp/pig_1481222529518.log

avatar
Explorer

@Mushtaq Rizvi -Note: The file does exists. I have confirmed that.

Causedby: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv

avatar
Super Collaborator

it says that the input path does not exist, can you also show the output of this command:

hdfs dfs -ls /tmp/demo/piglab04/data/

avatar
Explorer
hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data
Found 3 items
-rwxrwxrwx   3 maria_dev hdfs       2043 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.csv
-rwxrwxrwx   3 maria_dev hdfs      26205 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/timesheet.csv
-rwxrwxrwx   3 maria_dev hdfs    2272077 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/truck_event_text_partition.csv

avatar
Explorer
hdfs dfs -ls /tmp/demo/piglab04/data/
Found 3 items
-rwxrwxrwx   3 maria_dev hdfs       2043 2016-12-08 16:05 /tmp/demo/piglab04/data/drivers.csv
-rwxrwxrwx   3 maria_dev hdfs      26205 2016-12-08 16:05 /tmp/demo/piglab04/data/timesheet.csv
-rwxrwxrwx   3 maria_dev hdfs    2272077 2016-12-08 16:05 /tmp/demo/piglab04/data/truck_event_text_partition.csv

avatar
Explorer

For reference, i'm using the Microsoft Azure: Hortonworks Sandbox with HDP 2.4