Support Questions

pardeep_kumar · ‎09-29-2015

Need Best practices for Backup and DR for

- Hive Metastore DB i.e. MySQL, Postgres etc

- Hive Data

csankaraiah · ‎09-29-2015

Above two answers are great for Hive Metastore backup. Now for the hive data itself, here are few options

Option1) Hive data gets stored in HDFS (Hadoop Distributed File System), so any backup or DR (Disaster Recovery) strategy you have for HDFS could be used for Hive as well. So, you can use snapshot feature in HDFS to take a point in time image. These snapshot could be for entire file system, a sub-tree in a file system or just a file. You can also take incremental snapshot by doing a diff between two snapshots.

Option2) You can write your own distcp code and make it part of a Falcon data pipeline.

Using Distcp to copy files

Option3) You can use Falcon Data mirroring capability to mirror the data in HDFS or Hive tables.

Here is a link on that

Falcon Data Mirroring

Option4) You can have a active - active data load to both your primary cluster as well as your DR Cluster. So for example if you are using a scoop job to pull the data from a particular RDBMS and load it into hive table, you can create two scoop jobs one to load the primary cluster hive table and other to load the DR cluster Hive table.

You choice of which option to pick depends upon the SLA (Service level agreements) around DR/Backup, budget, Skill level etc.

View solution in original post

Jagatheeshr · ‎09-29-2015

For Hive on Oracle , Data Guard could be used as DR Solution. Refer : Oracle Dataguard - Transparent Application Failover

mjohansson · ‎09-29-2015

For the Hive Metastore Mysql - you can configure Hive Metastore Service for HA on multiple boxes and Mysql also need to be configured for active-active replication. - more info at High Availability for Hive Metastore.

Backup/Restore for Hive Megastore is covered in 5.1.7. Perform Backups. The backup method we normally use is "mysqldump hive > /tmp/mydir/backup_hive.sql" . Please observe that there are various ways of backing up mysql databases and the important one is to backup the Hive database schema. For a full DR solution of Mysql you need to back up mysql config files etc. For a description of Mysql Backup/Restore please see http://dev.mysql.com/doc/mysql-backup-excerpt/5.7/en/index.html.

csankaraiah · ‎09-29-2015