Support Questions

Find answers, ask questions, and share your expertise

DR with Falcon: handling changing data; distcp validation; snapshotting

avatar
Contributor
Looking for best practises around DR replication option with Falcon (or Ozzie+distcp..).
  1. Using either feed based replication or mirror recipe in Falcon (that both leverage distcp to my understanding), how does it handle the situation where clients are still writing, moving, or deleting in the source cluster?

    The distcp documentation states if another client is still writing to a source file, the copy will likely fail..

  2. Does Falcon provide any data validation mechanism that the transfer with distcp was successful?
  3. What additional benefit would snapshotting have here? (and does Falcon do this?)
1 ACCEPTED SOLUTION

avatar

@Piotr Pruski:

  1. As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
  2. If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
  3. Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.

View solution in original post

5 REPLIES 5

avatar

@Piotr Pruski

I think we have answers here for your 1 or 2 questions. link

avatar

@piotr pruski

Nice question. Would help a lot of people in the community.

Thank you.

avatar

@Piotr Pruski:

  1. As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
  2. If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
  3. Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.

avatar
Expert Contributor

What about support for -overwrite and -update flags in HDFS Mirror (Falcon) ?

avatar
Rising Star

To add to Sowmya's response:

If a Falcon Mirror process fails, Falcon will attempt the copy again, so a file momentarily open will be captured on the retry.