Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DR with Falcon: handling changing data; distcp validation; snapshotting

Solved Go to solution
Highlighted

DR with Falcon: handling changing data; distcp validation; snapshotting

New Contributor
Looking for best practises around DR replication option with Falcon (or Ozzie+distcp..).
  1. Using either feed based replication or mirror recipe in Falcon (that both leverage distcp to my understanding), how does it handle the situation where clients are still writing, moving, or deleting in the source cluster?

    The distcp documentation states if another client is still writing to a source file, the copy will likely fail..

  2. Does Falcon provide any data validation mechanism that the transfer with distcp was successful?
  3. What additional benefit would snapshotting have here? (and does Falcon do this?)
1 ACCEPTED SOLUTION

Accepted Solutions

Re: DR with Falcon: handling changing data; distcp validation; snapshotting

@Piotr Pruski:

  1. As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
  2. If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
  3. Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.

5 REPLIES 5

Re: DR with Falcon: handling changing data; distcp validation; snapshotting

@Piotr Pruski

I think we have answers here for your 1 or 2 questions. link

Re: DR with Falcon: handling changing data; distcp validation; snapshotting

@piotr pruski

Nice question. Would help a lot of people in the community.

Thank you.

Re: DR with Falcon: handling changing data; distcp validation; snapshotting

@Piotr Pruski:

  1. As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
  2. If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
  3. Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.

Re: DR with Falcon: handling changing data; distcp validation; snapshotting

Rising Star

What about support for -overwrite and -update flags in HDFS Mirror (Falcon) ?

Re: DR with Falcon: handling changing data; distcp validation; snapshotting

Contributor

To add to Sowmya's response:

If a Falcon Mirror process fails, Falcon will attempt the copy again, so a file momentarily open will be captured on the retry.