Created 08-29-2016 04:51 PM
Can someone please share how to use distcp+oozie (not Falcon) for cluster DR/replication.
My understanding is that the entire distcp job will fail if any file in the path is being written to, and the best way around that would be to do the distcp against snapshots. But what is the entire end to end process?
Also, what checks can be done on the DR cluster to ensure the success of the job and that the data is synced with the metastore?
Created 08-30-2016 02:55 PM
Here's good link, good link that explains how to use snapshots. That's step 1.
Step 2 is backup up the meta store.(backup of your databases.)
If you need a link that talks through DR for hadoop this is a good one.
I think once you go through those links you will feel better about the process.
High level
Regularly create: Snapshots of the file system, and databases
Store offsite (with distcp)
Hadoop does have checksum built in so i'm sure you could write a script to cross check that it's got the correct files. But really why not practice recovering the data into the [backup] cluster as part of your process? You could then run some integrity tests from the snapshot on the live cluster and the [backup cluster's recovery].
Created 08-30-2016 02:55 PM
Here's good link, good link that explains how to use snapshots. That's step 1.
Step 2 is backup up the meta store.(backup of your databases.)
If you need a link that talks through DR for hadoop this is a good one.
I think once you go through those links you will feel better about the process.
High level
Regularly create: Snapshots of the file system, and databases
Store offsite (with distcp)
Hadoop does have checksum built in so i'm sure you could write a script to cross check that it's got the correct files. But really why not practice recovering the data into the [backup] cluster as part of your process? You could then run some integrity tests from the snapshot on the live cluster and the [backup cluster's recovery].
Created 09-01-2016 02:02 AM
the links seem to be showing now, if you still can't see them pm me and I'll send them to you directly
Created 09-01-2016 02:09 AM
Although I have described the process of DIY, Falcon does support snapshots and does you distcp underneath is all... this should be mentioned even though you asked for a work around Falcon.
Created 09-01-2016 02:14 AM
I like my answer but you should also check out https://community.hortonworks.com/questions/394/what-are-best-practices-for-setting-up-backup-and.ht...