Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive table doesn't detect input HDFS location because of "_"

SOLVED Go to solution

Hive table doesn't detect input HDFS location because of "_"

Rising Star

I have a scenario where i'm trying to create a table which points to an HDFS location which has a directory name starting with an "_" in the HDFS path. Now table creation goes through but If I try to read data out of the table it throws error, below is what i get:

 

create external table `ingest.workgroup__views2` row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location 'hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views' tblproperties ('avro.schema.url'='hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views/_gen/_views.avsc');
No rows affected (0.232 seconds)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> select * from ingest.workgroup__views2;
Error: java.io.IOException: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views (state=,code=0)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> drop table ingest.workgroup__views2;

 

So i escape the special character "_" in location and the table gets created and i' able to run select to see data as below:

 

create external table `ingest.workgroup__views2` row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location 'hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/\_views' tblproperties ('avro.schema.url'='hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views/_gen/_views.avsc');
No rows affected (0.19 seconds)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> select * from ingest.workgroup__views2;
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
| workgroup__views2.id | workgroup__views2.name | workgroup__views2.view_url | workgroup__views2.created_at | workgroup__views2.owner_id | workgroup__views2.owner_name | workgroup__views2.workbook_id | workgroup__views2.index | workgroup__views2.title | workgroup__views2.caption | workgroup__views2.site_id |
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
No rows selected (0.139 seconds)

 

Now the weird part is its only the location part which has this issue, parsing of URI mentioned under tblproperties goes through as you can see above and if I explicitly try to escape "_" in tblproperties it doesn't work. 

 

Any comments or suggestions will be helpful on the above obesrvation.

 

Regards

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive table doesn't detect input HDFS location because of "_"

@Prav ,

 

This appears to have been listed as a bug (which is actually a longstanding limitation due to the definition of files and directories with _ and . being considered as "hidden" in FileInputFormat in Hadoop) of Hive since the 0.12 version:

 

https://issues.apache.org/jira/browse/HIVE-6431

https://stackoverflow.com/questions/19830264/which-files-are-ignored-as-input-by-mapper

 

If these files are needed to be seen, please consider using a pre-process script to rename them after loading.

 

Thanks,



Robert Justice, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

2 REPLIES 2

Re: Hive table doesn't detect input HDFS location because of "_"

Rising Star

Hey Network,

 

Anyone had this issue or maybe Cloudera team in this community may share if this a known bug etc?

 

Regards

 

Re: Hive table doesn't detect input HDFS location because of "_"

@Prav ,

 

This appears to have been listed as a bug (which is actually a longstanding limitation due to the definition of files and directories with _ and . being considered as "hidden" in FileInputFormat in Hadoop) of Hive since the 0.12 version:

 

https://issues.apache.org/jira/browse/HIVE-6431

https://stackoverflow.com/questions/19830264/which-files-are-ignored-as-input-by-mapper

 

If these files are needed to be seen, please consider using a pre-process script to rename them after loading.

 

Thanks,



Robert Justice, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service