Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are best practices for setting up Backup and Disaster Recovery for Hive in production?

avatar

Need Best practices for Backup and DR for

- Hive Metastore DB i.e. MySQL, Postgres etc

- Hive Data

1 ACCEPTED SOLUTION

avatar
Rising Star

Above two answers are great for Hive Metastore backup. Now for the hive data itself, here are few options

Option1) Hive data gets stored in HDFS (Hadoop Distributed File System), so any backup or DR (Disaster Recovery) strategy you have for HDFS could be used for Hive as well. So, you can use snapshot feature in HDFS to take a point in time image. These snapshot could be for entire file system, a sub-tree in a file system or just a file. You can also take incremental snapshot by doing a diff between two snapshots.

Option2) You can write your own distcp code and make it part of a Falcon data pipeline.

Using Distcp to copy files

Option3) You can use Falcon Data mirroring capability to mirror the data in HDFS or Hive tables.

Here is a link on that

Falcon Data Mirroring

Option4) You can have a active - active data load to both your primary cluster as well as your DR Cluster. So for example if you are using a scoop job to pull the data from a particular RDBMS and load it into hive table, you can create two scoop jobs one to load the primary cluster hive table and other to load the DR cluster Hive table.

You choice of which option to pick depends upon the SLA (Service level agreements) around DR/Backup, budget, Skill level etc.

View solution in original post

4 REPLIES 4

avatar

For Hive on Oracle , Data Guard could be used as DR Solution. Refer : Oracle Dataguard - Transparent Application Failover

avatar
Super Collaborator

For the Hive Metastore Mysql - you can configure Hive Metastore Service for HA on multiple boxes and Mysql also need to be configured for active-active replication. - more info at High Availability for Hive Metastore.

Backup/Restore for Hive Megastore is covered in 5.1.7. Perform Backups. The backup method we normally use is "mysqldump hive > /tmp/mydir/backup_hive.sql" . Please observe that there are various ways of backing up mysql databases and the important one is to backup the Hive database schema. For a full DR solution of Mysql you need to back up mysql config files etc. For a description of Mysql Backup/Restore please see http://dev.mysql.com/doc/mysql-backup-excerpt/5.7/en/index.html.

avatar
Rising Star

Above two answers are great for Hive Metastore backup. Now for the hive data itself, here are few options

Option1) Hive data gets stored in HDFS (Hadoop Distributed File System), so any backup or DR (Disaster Recovery) strategy you have for HDFS could be used for Hive as well. So, you can use snapshot feature in HDFS to take a point in time image. These snapshot could be for entire file system, a sub-tree in a file system or just a file. You can also take incremental snapshot by doing a diff between two snapshots.

Option2) You can write your own distcp code and make it part of a Falcon data pipeline.

Using Distcp to copy files

Option3) You can use Falcon Data mirroring capability to mirror the data in HDFS or Hive tables.

Here is a link on that

Falcon Data Mirroring

Option4) You can have a active - active data load to both your primary cluster as well as your DR Cluster. So for example if you are using a scoop job to pull the data from a particular RDBMS and load it into hive table, you can create two scoop jobs one to load the primary cluster hive table and other to load the DR cluster Hive table.

You choice of which option to pick depends upon the SLA (Service level agreements) around DR/Backup, budget, Skill level etc.

avatar
Explorer

@Chakra You may also have Hive data declared as an external table, in which case this data sits in a file store outside of HDFS. In such a case, as long as you back up your Hive meta store, you should be good, assuming the external file store has its own backup and restore poiclies.