Support Questions

Find answers, ask questions, and share your expertise
Welcome to the upgraded Community! Read this blog to see What’s New!

Hive external table partition update automatically


Hi All,


I have created an external table pointing to HDFS location where data gets stored for everyday logs.


 location: /user/data/year=2021/

partition: month and day


hdfs dfs -ls /user/data/year=2021/
Found 5 items
drwxr-xr-x - user user 0 2021-03-19 16:53 /user/data/year=2021/month=03/day=18
drwxr-xr-x - user user 0 2021-03-20 16:04 /user/data/year=2021/month=03/day=19
drwxr-xr-x - user user 0 2021-03-21 16:59 /user/data/year=2021/month=03/day=20
drwxr-xr-x - user user 0 2021-03-22 16:57 /user/data/year=2021/month=03/day=21


Is there a way where my external table partitions get updated automatically when new file gets added to hdfs path.


Now, when I run manually, the table gets updated.
hive>msck repair table <table_name> 


Please let me know if there is anyway to update the table automatically, when the location gets updated.


Thank You!





Can someone please help me with this..!

New Contributor

The "msck repair table ..." command does not really read new data files, but adds new partitions (subdirectories in HDFS) in table metadata.

What you could do is to create in advance all the partitions (for month or more) - initially empty- and run the "repair" command just once:

hdfs dfs -mkdir /user/data/year=2021/month=04/day=1


hdfs dfs -mkdir /user/data/year=2021/month=04/day=30

hive>msck repair table <table_name>


When You put your log files inside one of these directories, they will be immediately visible from Hive (just set the correct permissions using Ranger or hdfs).


Maybe You can repeat this operations (create directories and "repair table") during logs maintenance, as you should have some policies to remove old logs


Hope this helps