- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
DR with Falcon: handling changing data; distcp validation; snapshotting
- Labels:
-
Apache Falcon
-
Apache Hadoop
-
Apache Oozie
Created ‎05-04-2016 03:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Using either feed based replication or mirror recipe in Falcon (that both leverage distcp to my understanding), how does it handle the situation where clients are still writing, moving, or deleting in the source cluster?
The distcp documentation states if another client is still writing to a source file, the copy will likely fail..
- Does Falcon provide any data validation mechanism that the transfer with distcp was successful?
- What additional benefit would snapshotting have here? (and does Falcon do this?)
Created ‎05-09-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
- If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
- Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.
Created ‎05-04-2016 03:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think we have answers here for your 1 or 2 questions. link
Created ‎05-09-2016 05:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@piotr pruski
Nice question. Would help a lot of people in the community.
Thank you.
Created ‎05-09-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail
- If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes.
- Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes it's cost effective.
Created ‎11-18-2016 06:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What about support for -overwrite and -update flags in HDFS Mirror (Falcon) ?
Created ‎05-11-2016 06:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To add to Sowmya's response:
If a Falcon Mirror process fails, Falcon will attempt the copy again, so a file momentarily open will be captured on the retry.
