Posts: 34
Registered: ‎11-24-2017


[ Edited ]

Not sure where to post this question, let me know if this is the wrong section.


I've an Oozie bundle with some coordinators inside which import data from various sources and generate hive tables with some transformations. This is scheduled once every day.


I need to design a rollback procedure that brings the cluster to the status of the previous day.

I was thinking to add these two operations before starting the daily import/transformation tasks:

  • Make a snapshot (backup) of the hdfs hive data in a backup folder
  • Make a backup of the Hive Metastore database (MySQL)

Then when I need to rollback I can stop the current Oozie bundle, overwrite the hdfs Hive data with the data in the backup folder and restore the Hive Metastore database. 

My questions:

  1. Is this going to work? Or are there critical problems that I am not seeing?
  2. Which approach do you guys suggest to support rollback in a CLoudera environment?


Thanks for any information