Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive table doesn't detect input HDFS location because of "_"

avatar
Contributor

I have a scenario where i'm trying to create a table which points to an HDFS location which has a directory name starting with an "_" in the HDFS path. Now table creation goes through but If I try to read data out of the table it throws error, below is what i get:

 

create external table `ingest.workgroup__views2` row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location 'hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views' tblproperties ('avro.schema.url'='hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views/_gen/_views.avsc');
No rows affected (0.232 seconds)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> select * from ingest.workgroup__views2;
Error: java.io.IOException: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views (state=,code=0)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> drop table ingest.workgroup__views2;

 

So i escape the special character "_" in location and the table gets created and i' able to run select to see data as below:

 

create external table `ingest.workgroup__views2` row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location 'hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/\_views' tblproperties ('avro.schema.url'='hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views/_gen/_views.avsc');
No rows affected (0.19 seconds)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> select * from ingest.workgroup__views2;
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
| workgroup__views2.id | workgroup__views2.name | workgroup__views2.view_url | workgroup__views2.created_at | workgroup__views2.owner_id | workgroup__views2.owner_name | workgroup__views2.workbook_id | workgroup__views2.index | workgroup__views2.title | workgroup__views2.caption | workgroup__views2.site_id |
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
No rows selected (0.139 seconds)

 

Now the weird part is its only the location part which has this issue, parsing of URI mentioned under tblproperties goes through as you can see above and if I explicitly try to escape "_" in tblproperties it doesn't work. 

 

Any comments or suggestions will be helpful on the above obesrvation.

 

Regards

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Prav ,

 

This appears to have been listed as a bug (which is actually a longstanding limitation due to the definition of files and directories with _ and . being considered as "hidden" in FileInputFormat in Hadoop) of Hive since the 0.12 version:

 

https://issues.apache.org/jira/browse/HIVE-6431

https://stackoverflow.com/questions/19830264/which-files-are-ignored-as-input-by-mapper

 

If these files are needed to be seen, please consider using a pre-process script to rename them after loading.

 

Thanks,



Robert Justice, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

View solution in original post

2 REPLIES 2

avatar
Contributor

Hey Network,

 

Anyone had this issue or maybe Cloudera team in this community may share if this a known bug etc?

 

Regards

 

avatar
Expert Contributor

@Prav ,

 

This appears to have been listed as a bug (which is actually a longstanding limitation due to the definition of files and directories with _ and . being considered as "hidden" in FileInputFormat in Hadoop) of Hive since the 0.12 version:

 

https://issues.apache.org/jira/browse/HIVE-6431

https://stackoverflow.com/questions/19830264/which-files-are-ignored-as-input-by-mapper

 

If these files are needed to be seen, please consider using a pre-process script to rename them after loading.

 

Thanks,



Robert Justice, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service