Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HIVE External Tables

avatar
New Contributor

Here is the HDFS folder structure - /FLIGHT/2019/03/01/XYZ.tsv, /FLIGHT/2019/03/02/XYZ.tsv, /FLIGHT/2019/03/03/XYZ.tsv etc. While declaring HIVE external table LOCATION can we use something like /FLIGHT/{*}/{*}/{*}XYZ.tsv.

Basically, we just have to read all files that match that specific pattern. Thanks for your feedback.

1 ACCEPTED SOLUTION

avatar

You just need to align your LOCATION clause of EXTERNALS TABLE's DDL to point to your /FLIGHT folder. Hive will crawl all the subfolders.

You might also consider using PARTITION BY and instead of having folders for year, month a day. This let's you do things like WHERE my_partition_col > '19991115' AND my_partition_col < '20010215' which would be much tougher if you partition by specific year, month, and day values.

View solution in original post

1 REPLY 1

avatar

You just need to align your LOCATION clause of EXTERNALS TABLE's DDL to point to your /FLIGHT folder. Hive will crawl all the subfolders.

You might also consider using PARTITION BY and instead of having folders for year, month a day. This let's you do things like WHERE my_partition_col > '19991115' AND my_partition_col < '20010215' which would be much tougher if you partition by specific year, month, and day values.