Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive external table partition update automatically

avatar
Explorer

Hi All,

 

I have created an external table pointing to HDFS location where data gets stored for everyday logs.

Details: 

 location: /user/data/year=2021/

partition: month and day

 

hdfs dfs -ls /user/data/year=2021/
Found 5 items
drwxr-xr-x - user user 0 2021-03-19 16:53 /user/data/year=2021/month=03/day=18
drwxr-xr-x - user user 0 2021-03-20 16:04 /user/data/year=2021/month=03/day=19
drwxr-xr-x - user user 0 2021-03-21 16:59 /user/data/year=2021/month=03/day=20
drwxr-xr-x - user user 0 2021-03-22 16:57 /user/data/year=2021/month=03/day=21

 

Is there a way where my external table partitions get updated automatically when new file gets added to hdfs path.

 

Now, when I run manually, the table gets updated.
hive>msck repair table <table_name> 

 

Please let me know if there is anyway to update the table automatically, when the location gets updated.

 

Thank You!

2 REPLIES 2

avatar
Explorer

Hi,

 

Can someone please help me with this..!

avatar

The "msck repair table ..." command does not really read new data files, but adds new partitions (subdirectories in HDFS) in table metadata.

What you could do is to create in advance all the partitions (for month or more) - initially empty- and run the "repair" command just once:

hdfs dfs -mkdir /user/data/year=2021/month=04/day=1

...

hdfs dfs -mkdir /user/data/year=2021/month=04/day=30

hive>msck repair table <table_name>

 

When You put your log files inside one of these directories, they will be immediately visible from Hive (just set the correct permissions using Ranger or hdfs).

 

Maybe You can repeat this operations (create directories and "repair table") during logs maintenance, as you should have some policies to remove old logs

 

Hope this helps