Support Questions

Find answers, ask questions, and share your expertise

Who agreed with this topic

Hive table doesn't detect input HDFS location because of "_"

avatar
Contributor

I have a scenario where i'm trying to create a table which points to an HDFS location which has a directory name starting with an "_" in the HDFS path. Now table creation goes through but If I try to read data out of the table it throws error, below is what i get:

 

create external table `ingest.workgroup__views2` row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location 'hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views' tblproperties ('avro.schema.url'='hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views/_gen/_views.avsc');
No rows affected (0.232 seconds)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> select * from ingest.workgroup__views2;
Error: java.io.IOException: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views (state=,code=0)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> drop table ingest.workgroup__views2;

 

So i escape the special character "_" in location and the table gets created and i' able to run select to see data as below:

 

create external table `ingest.workgroup__views2` row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location 'hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/\_views' tblproperties ('avro.schema.url'='hdfs://nameservice1/user/data/ingest/mdm/workgroup_i/workgroup/_views/_gen/_views.avsc');
No rows affected (0.19 seconds)
0: jdbc:hive2://t-hive.sys.cigna.com:25006/de> select * from ingest.workgroup__views2;
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
| workgroup__views2.id | workgroup__views2.name | workgroup__views2.view_url | workgroup__views2.created_at | workgroup__views2.owner_id | workgroup__views2.owner_name | workgroup__views2.workbook_id | workgroup__views2.index | workgroup__views2.title | workgroup__views2.caption | workgroup__views2.site_id |
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
+-----------------------+-------------------------+-----------------------------+-------------------------------+-----------------------------+-------------------------------+--------------------------------+--------------------------+--------------------------+----------------------------+----------------------------+--+
No rows selected (0.139 seconds)

 

Now the weird part is its only the location part which has this issue, parsing of URI mentioned under tblproperties goes through as you can see above and if I explicitly try to escape "_" in tblproperties it doesn't work. 

 

Any comments or suggestions will be helpful on the above obesrvation.

 

Regards

Who agreed with this topic