Created 06-03-2018 06:59 AM
Hi,
We are copying files from our upstream system which are in JSON GZ format. They are following a pattern for very daily slice say YYYYMMDDHH (2018053100) they are maintianing two folders DATA and METADATA. Where DATA holds actual data and METADATA holds RowCount of that day's data.We need to create external table on top of copied data, where it only consider *.json.gz extension files only, excluding other file extensions. We dont want to copy files into another location since they are large in size. We also tried INPUT_ FILENAME .... virtual column, it didn't work. Any suggesstion for this scnearios ?
Created 06-04-2018 04:03 PM
Hey @Mahender S!
You can try a solution with hive + regex, using RegexSerde
Here's an example:
CREATE EXTERNAL TABLE Vini (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "*.json.gz") LOCATION 'hdfs://myHDFSwithHA/user/hive/warehouse/test.db/vini'Hope this helps ! 🙂
Created 06-04-2018 04:03 PM
Hey @Mahender S!
You can try a solution with hive + regex, using RegexSerde
Here's an example:
CREATE EXTERNAL TABLE Vini (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "*.json.gz") LOCATION 'hdfs://myHDFSwithHA/user/hive/warehouse/test.db/vini'Hope this helps ! 🙂
Created 06-20-2018 12:48 AM
Thanks Vinicius. It worked.
Created 06-20-2018 12:57 AM
Awesome @Mahender S
As the issue is resolved, hence it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue.!
Thanks 🙂