Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

DR/replication strategy using distcp + Oozie

avatar

Can someone please share how to use distcp+oozie (not Falcon) for cluster DR/replication.

My understanding is that the entire distcp job will fail if any file in the path is being written to, and the best way around that would be to do the distcp against snapshots. But what is the entire end to end process?

Also, what checks can be done on the DR cluster to ensure the success of the job and that the data is synced with the metastore?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Here's good link, good link that explains how to use snapshots. That's step 1.

Step 2 is backup up the meta store.(backup of your databases.)

If you need a link that talks through DR for hadoop this is a good one.

I think once you go through those links you will feel better about the process.

High level

Regularly create: Snapshots of the file system, and databases

Store offsite (with distcp)

Hadoop does have checksum built in so i'm sure you could write a script to cross check that it's got the correct files. But really why not practice recovering the data into the [backup] cluster as part of your process? You could then run some integrity tests from the snapshot on the live cluster and the [backup cluster's recovery].

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

Here's good link, good link that explains how to use snapshots. That's step 1.

Step 2 is backup up the meta store.(backup of your databases.)

If you need a link that talks through DR for hadoop this is a good one.

I think once you go through those links you will feel better about the process.

High level

Regularly create: Snapshots of the file system, and databases

Store offsite (with distcp)

Hadoop does have checksum built in so i'm sure you could write a script to cross check that it's got the correct files. But really why not practice recovering the data into the [backup] cluster as part of your process? You could then run some integrity tests from the snapshot on the live cluster and the [backup cluster's recovery].

avatar
Expert Contributor

the links seem to be showing now, if you still can't see them pm me and I'll send them to you directly

avatar
Expert Contributor

Although I have described the process of DIY, Falcon does support snapshots and does you distcp underneath is all... this should be mentioned even though you asked for a work around Falcon.

avatar
Expert Contributor