Support Questions

Find answers, ask questions, and share your expertise

Any way to create External Table with Specific Set of Files

avatar

Hi,

We are copying files from our upstream system which are in JSON GZ format. They are following a pattern for very daily slice say YYYYMMDDHH (2018053100) they are maintianing two folders DATA and METADATA. Where DATA holds actual data and METADATA holds RowCount of that day's data.We need to create external table on top of copied data, where it only consider *.json.gz extension files only, excluding other file extensions. We dont want to copy files into another location since they are large in size. We also tried INPUT_ FILENAME .... virtual column, it didn't work. Any suggesstion for this scnearios ?

1 ACCEPTED SOLUTION

avatar

Hey @Mahender S!
You can try a solution with hive + regex, using RegexSerde
Here's an example:

CREATE EXTERNAL TABLE Vini (name STRING) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES  ( "input.regex" = "*.json.gz")
LOCATION 'hdfs://myHDFSwithHA/user/hive/warehouse/test.db/vini'
Hope this helps ! 🙂

View solution in original post

3 REPLIES 3

avatar

Hey @Mahender S!
You can try a solution with hive + regex, using RegexSerde
Here's an example:

CREATE EXTERNAL TABLE Vini (name STRING) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES  ( "input.regex" = "*.json.gz")
LOCATION 'hdfs://myHDFSwithHA/user/hive/warehouse/test.db/vini'
Hope this helps ! 🙂

avatar

Thanks Vinicius. It worked.

avatar

Awesome @Mahender S
As the issue is resolved, hence it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue.!
Thanks 🙂