Support Questions

Find answers, ask questions, and share your expertise

HIVE External Tables

avatar
New Contributor

Here is the HDFS folder structure - /FLIGHT/2019/03/01/XYZ.tsv, /FLIGHT/2019/03/02/XYZ.tsv, /FLIGHT/2019/03/03/XYZ.tsv etc. While declaring HIVE external table LOCATION can we use something like /FLIGHT/{*}/{*}/{*}XYZ.tsv.

Basically, we just have to read all files that match that specific pattern. Thanks for your feedback.

1 ACCEPTED SOLUTION

avatar

You just need to align your LOCATION clause of EXTERNALS TABLE's DDL to point to your /FLIGHT folder. Hive will crawl all the subfolders.

You might also consider using PARTITION BY and instead of having folders for year, month a day. This let's you do things like WHERE my_partition_col > '19991115' AND my_partition_col < '20010215' which would be much tougher if you partition by specific year, month, and day values.

View solution in original post

1 REPLY 1

avatar

You just need to align your LOCATION clause of EXTERNALS TABLE's DDL to point to your /FLIGHT folder. Hive will crawl all the subfolders.

You might also consider using PARTITION BY and instead of having folders for year, month a day. This let's you do things like WHERE my_partition_col > '19991115' AND my_partition_col < '20010215' which would be much tougher if you partition by specific year, month, and day values.