<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Falcon mirroring assumptions and guarantees in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96087#M59435</link>
    <description>&lt;P&gt;Falcon supports mirroring for HDFS and Hive.&lt;/P&gt;&lt;P&gt;Performance issue I mentioned above is only for HDFS mirroring, if replicated data is not evicted. This is because for Hive mirroring , last successfully replicated event id will be saved in the data store by Falcon and next replication job will start replication past the last successfully replicated event id. Also Falcon cleans up staging paths used for export after the job runs. As DistCP will get only the new data to be replicated there is no performance overhead for Hive mirroring.&lt;/P&gt;&lt;P&gt;Just an FYI.&lt;/P&gt;</description>
    <pubDate>Wed, 28 Oct 2015 08:21:50 GMT</pubDate>
    <dc:creator>sramesh</dc:creator>
    <dc:date>2015-10-28T08:21:50Z</dc:date>
    <item>
      <title>Falcon mirroring assumptions and guarantees</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96082#M59430</link>
      <description>&lt;P&gt;Do we have a detailed technical write-up on Falcon mirroring? It uses distcp under the hood, and I can only assume it uses the -update option, but are there any exceptions to how precisely it follows the distcp docs/functionality? I'm mostly concerned with partially-completed jobs that might have tmp files hanging around when the copy kicks off. I have a use case where the user would like to use mirroring to replicate 1..n feeds within a directory instead of setting up fine-grained feed replication, e.g. &lt;/P&gt;&lt;P&gt;mirror job 1=&lt;/P&gt;&lt;P&gt;- /data/cust/cust1&lt;/P&gt;&lt;P&gt;    - /feed-1&lt;/P&gt;&lt;P&gt;    - /feed-n&lt;/P&gt;&lt;P&gt;mirror job 2=&lt;/P&gt;&lt;P&gt;- /data/cust/cust2&lt;/P&gt;&lt;P&gt;    - /feed-1&lt;/P&gt;&lt;P&gt;    - /feed-n&lt;/P&gt;&lt;P&gt;Any info is appreciated.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2015 23:58:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96082#M59430</guid>
      <dc:creator>mmiklavcic</dc:creator>
      <dc:date>2015-10-27T23:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: Falcon mirroring assumptions and guarantees</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96083#M59431</link>
      <description>&lt;P&gt;Today replication in Falcon can be achieved using two ways:&lt;/P&gt;&lt;P&gt;1&amp;gt; Feed based Replication: Falcon uses pull based replication mechanism, meaning in every target cluster, for a given source cluster, a coordinator is scheduled which pulls the data using DistCp from source cluster. This requires data locations to be replicated to have dated partitions.&lt;/P&gt;&lt;P&gt;2&amp;gt; Using concept of Recipes:&lt;/P&gt;&lt;P&gt;HDFS Directory Replication Recipe&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Overview&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This recipe implements replicating arbitrary directories on HDFS from one Hadoop cluster to another Hadoop cluster. This piggy backs on replication solution in Falcon which uses the DistCp tool.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Use Case&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;* Copy directories between HDFS clusters with out dated partitions&lt;/P&gt;&lt;P&gt;* Archive directories from HDFS to Cloud. Ex: S3, Azure WASB&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Limitations&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As the data volume and number of files grow, this can get inefficient. User should make sure data already replicated is evicted else it will have performance issues.&lt;/P&gt;&lt;P&gt;For both of the above mechanisms, &lt;A href="http://falcon.apache.org/0.7/DistCp.html"&gt;DistCp&lt;/A&gt; options can be passed as custom properties, which will be propagated to the &lt;A href="http://falcon.apache.org/0.7/DistCp.html"&gt;DistCp&lt;/A&gt; tool.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;maxMaps represents the maximum number of maps used during replication&lt;/LI&gt;&lt;LI&gt;mapBandwidth represents the bandwidth in MB/s used by each mapper during replication&lt;/LI&gt;&lt;LI&gt;overwrite represents overwrite destination during replication&lt;/LI&gt;&lt;LI&gt;ignoreErrors represents ignore failures not causing the job to fail during replication &lt;/LI&gt;&lt;LI&gt;skipChecksum represents bypassing checksum verification during replication&lt;/LI&gt;&lt;LI&gt;removeDeletedFiles represents deleting the files existing in the destination but not in source during replication &lt;/LI&gt;&lt;LI&gt;preserveBlockSize represents preserving block size during replication&lt;/LI&gt;&lt;LI&gt;preserveReplicationNumber represents preserving replication number during replication&lt;/LI&gt;&lt;LI&gt;preservePermission represents preserving permission during replication&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Wed, 28 Oct 2015 00:59:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96083#M59431</guid>
      <dc:creator>sramesh</dc:creator>
      <dc:date>2015-10-28T00:59:07Z</dc:date>
    </item>
    <item>
      <title>Re: Falcon mirroring assumptions and guarantees</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96084#M59432</link>
      <description>&lt;P&gt;Ok, so it's a 1-to-1 mapping of the DistCP functionality that we currently choose to expose (I added the features for maxMaps and mapBandwidth &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ). Incidentally, in HDP 2.3 the Falcon UI does not have a way to include mirror job parameters. You can do it with the traditional feed definitions.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 07:28:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96084#M59432</guid>
      <dc:creator>mmiklavcic</dc:creator>
      <dc:date>2015-10-28T07:28:16Z</dc:date>
    </item>
    <item>
      <title>Re: Falcon mirroring assumptions and guarantees</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96085#M59433</link>
      <description>&lt;P&gt;For mirroring using recipes you can do it using cmd line. I will create a bug to track mirroring UI not having a way to include mirror job parameters. Thanks for bringing that up!&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 07:58:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96085#M59433</guid>
      <dc:creator>sramesh</dc:creator>
      <dc:date>2015-10-28T07:58:19Z</dc:date>
    </item>
    <item>
      <title>Re: Falcon mirroring assumptions and guarantees</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96086#M59434</link>
      <description>&lt;P&gt;&lt;A href="https://hortonworks.jira.com/browse/BUG-46884" target="_blank"&gt;https://hortonworks.jira.com/browse/BUG-46884&lt;/A&gt; has been created to track the UI issue.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 08:06:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96086#M59434</guid>
      <dc:creator>sramesh</dc:creator>
      <dc:date>2015-10-28T08:06:47Z</dc:date>
    </item>
    <item>
      <title>Re: Falcon mirroring assumptions and guarantees</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96087#M59435</link>
      <description>&lt;P&gt;Falcon supports mirroring for HDFS and Hive.&lt;/P&gt;&lt;P&gt;Performance issue I mentioned above is only for HDFS mirroring, if replicated data is not evicted. This is because for Hive mirroring , last successfully replicated event id will be saved in the data store by Falcon and next replication job will start replication past the last successfully replicated event id. Also Falcon cleans up staging paths used for export after the job runs. As DistCP will get only the new data to be replicated there is no performance overhead for Hive mirroring.&lt;/P&gt;&lt;P&gt;Just an FYI.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2015 08:21:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Falcon-mirroring-assumptions-and-guarantees/m-p/96087#M59435</guid>
      <dc:creator>sramesh</dc:creator>
      <dc:date>2015-10-28T08:21:50Z</dc:date>
    </item>
  </channel>
</rss>

