- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive external table partition update automatically
Created ‎03-23-2021 12:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have created an external table pointing to HDFS location where data gets stored for everyday logs.
Details:
location: /user/data/year=2021/
partition: month and day
hdfs dfs -ls /user/data/year=2021/
Found 5 items
drwxr-xr-x - user user 0 2021-03-19 16:53 /user/data/year=2021/month=03/day=18
drwxr-xr-x - user user 0 2021-03-20 16:04 /user/data/year=2021/month=03/day=19
drwxr-xr-x - user user 0 2021-03-21 16:59 /user/data/year=2021/month=03/day=20
drwxr-xr-x - user user 0 2021-03-22 16:57 /user/data/year=2021/month=03/day=21
Is there a way where my external table partitions get updated automatically when new file gets added to hdfs path.
Now, when I run manually, the table gets updated.
hive>msck repair table <table_name>
Please let me know if there is anyway to update the table automatically, when the location gets updated.
Thank You!
Created ‎03-30-2021 01:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can someone please help me with this..!
Created ‎03-31-2021 07:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "msck repair table ..." command does not really read new data files, but adds new partitions (subdirectories in HDFS) in table metadata.
What you could do is to create in advance all the partitions (for month or more) - initially empty- and run the "repair" command just once:
hdfs dfs -mkdir /user/data/year=2021/month=04/day=1
...
hdfs dfs -mkdir /user/data/year=2021/month=04/day=30
hive>msck repair table <table_name>
When You put your log files inside one of these directories, they will be immediately visible from Hive (just set the correct permissions using Ranger or hdfs).
Maybe You can repeat this operations (create directories and "repair table") during logs maintenance, as you should have some policies to remove old logs
Hope this helps
