Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Manual Hive replication - metastore database!

Expert Contributor

Hi all,

I use DistCp every X minutes to transfert the HDFS data to a hot bachup cluster, should I replicate the whole hive metastore database (manualy or using DB HA..) to accomplish the backup/restore ? or I need just to import/export only some specefic hive metastore tables ?

Thanks in advance.





I think it is purly depends upon your business... still the below may help you...


I don't think a whole hive metastore database backup is required for every X minutes unless you have a specific usecase. So you can follow 


1. Take a whole hive metastore db backup  once

2. Take the delta/impacted table backup on top of it every X minutes


So that your backup metastore is up to date

Expert Contributor

Thanks @saranvisa to you response,

Yes, but the question is how can I define the delta/impacted tables backup within about 56 tables in the metastore database.

Thanks again.




I don't think any easy way is available to get the delta/impacted tables out of 56 tables. 


A high level idea, you can improve it as needed

1. Create a sample table

2. check the impact on underlined HDFS folder whenever any DDL changes applied on it  (or) if you can manage to capture the date/something else from descibe the formatted table before and after DDL change then it is good option. 

3. Write a script and pass the 56 tables as a parameter. The script will simply describe the formatted output for the given table and idenfiy the impact compare to your previous X hour

4. Run the above script before you take a back-up every X hours

Expert Contributor


Thank you, I think your idea is very good to apply in several use cases to define the modified hive/impala tables..

But I thing there is a misunderstanding, because I mean by the 56 tables the metastore (postgres) database tables! And the backup approach that I used based on 2 steps:
1- Backup the updated HDFS tables by DistCp (done).
2- Backup the metastore (postgres) datastores (current question

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.