Support Questions

ppruski · ‎05-04-2016

Looking for best practises around DR replication option with Falcon (or Ozzie+distcp..).

Using either feed based replication or mirror recipe in Falcon (that both leverage distcp to my understanding), how does it handle the situation where clients are still writing, moving, or deleting in the source cluster?
The distcp documentation states if another client is still writing to a source file, the copy will likely fail..
Does Falcon provide any data validation mechanism that the transfer with distcp was successful?
What additional benefit would snapshotting have here? (and does Falcon do this?)

sramesh · ‎05-09-2016

As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.

divakarreddy_a · ‎05-04-2016

I think we have answers here for your 1 or 2 questions. link

rbiswas1 · ‎05-09-2016

@piotr pruski

Nice question. Would help a lot of people in the community.

Thank you.

sramesh · ‎05-09-2016

As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.

ambud_sharma1 · ‎11-18-2016

What about support for -overwrite and -update flags in HDFS Mirror (Falcon) ?

cnormile · ‎05-11-2016

To add to Sowmya's response:

If a Falcon Mirror process fails, Falcon will attempt the copy again, so a file momentarily open will be captured on the retry.

DR with Falcon: handling changing data; distcp validation; snapshotting