Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Pig on Hortonworks Sandbox In Azure

avatar
New Member

I can't run pig on Hortonworks Sandbox in Azure. Keep getting this errors.

Input(s):                                                                                                                                            
Failed to read data from "hdfs://sanbox.hortonworks.com:8020/tmp/demo/data/drivers.tsv" 

Any ideas what the issue could be? I can't seem to get this to work.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Fru N. you are trying to load /tmp/demo/piglab04/data/drivers.tsv but your file has a csv extension. Change from tsv to csv. It should work.

View solution in original post

13 REPLIES 13

avatar
Super Collaborator

@Fru N. can you check whether the file is present at the specified location, and if yes, what are the permissions for that file?

avatar
New Member

Yes.. I checked. And file is present. I gave 777 permission.

SInce it's Azure, not sure if there are additional settings in need to change with my hdfs-site.xml

avatar
Super Collaborator

@Fru N., you do not have to change anything in hdfs-site.xml. Can you please send entire job log.

avatar
New Member

@Mushtaq Rizvi

grunt> drivers = LOAD '/tmp/demo/piglab04/data/drivers.tsv' USING PigStorage(',');

grunt> dump drivers;                                                                                                                                                                    
2016-12-08 18:44:08,284 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN                                                               
2016-12-08 18:44:08,386 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.                                              
2016-12-08 18:44:08,451 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstPara
llelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, St
reamTypeCastInserter]}                                                                                                                                                                  
2016-12-08 18:44:08,611 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/fnhdp/staging and resources directory is /tmp/temp101
1891030                                                                                                                                                                                 
2016-12-08 18:44:08,659 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false                           
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv                     
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)                                                               
        at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateNewSplits(MRInputHelpers.java:412)                                                                                    
        at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:291)                                                                             
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.processLoads(LoaderProcessor.java:171)                                                      
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.LoaderProcessor.visitTezOp(LoaderProcessor.java:183)                                                        
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:249)                                                                               
        at org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:55)                                                                                
        at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)                                                                                           
        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)                                                                                                              
        at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:461)                                                                
        at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:171)                                                                                
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304)                                                                          
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)                                                                                                                     
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)                                                                                                     
        at org.apache.pig.PigServer.storeEx(PigServer.java:1075)                                                                                                                        
        at org.apache.pig.PigServer.store(PigServer.java:1038)                                                                                                                          
        at org.apache.pig.PigServer.openIterator(PigServer.java:951)                                                                                                                    
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754)                                                                                                     
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)                                                                                        
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)                                                                                                
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)                                                                                                
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)                                                                                                                          
        at org.apache.pig.Main.run(Main.java:565)                                                                                                                                       
        at org.apache.pig.Main.main(Main.java:177)                                                                                                                                      
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                                                                  
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)                                                                                                
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                                                                        
        at java.lang.reflect.Method.invoke(Method.java:606)                                                                                                                             
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)                                                                                                                           
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)                                                                                                                          
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv               
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)                                                                     
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)                                                                                   
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)                                                       
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387)                                                                                    
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)                                                               
        ... 29 more                                                                                                                                                                     
2016-12-08 18:44:08,981 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2118: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv    

Details at logfile: /home/fnhdp/pig_1481222529518.log

avatar
New Member

@Mushtaq Rizvi -Note: The file does exists. I have confirmed that.

Causedby: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.tsv

avatar
Super Collaborator

it says that the input path does not exist, can you also show the output of this command:

hdfs dfs -ls /tmp/demo/piglab04/data/

avatar
New Member
hadoop fs -ls hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data
Found 3 items
-rwxrwxrwx   3 maria_dev hdfs       2043 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/drivers.csv
-rwxrwxrwx   3 maria_dev hdfs      26205 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/timesheet.csv
-rwxrwxrwx   3 maria_dev hdfs    2272077 2016-12-08 16:05 hdfs://sandbox.hortonworks.com:8020/tmp/demo/piglab04/data/truck_event_text_partition.csv

avatar
New Member
hdfs dfs -ls /tmp/demo/piglab04/data/
Found 3 items
-rwxrwxrwx   3 maria_dev hdfs       2043 2016-12-08 16:05 /tmp/demo/piglab04/data/drivers.csv
-rwxrwxrwx   3 maria_dev hdfs      26205 2016-12-08 16:05 /tmp/demo/piglab04/data/timesheet.csv
-rwxrwxrwx   3 maria_dev hdfs    2272077 2016-12-08 16:05 /tmp/demo/piglab04/data/truck_event_text_partition.csv

avatar
New Member

For reference, i'm using the Microsoft Azure: Hortonworks Sandbox with HDP 2.4