<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How does falcon handle late arriving data on target cluster? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117560#M26291</link>
    <description>Whether the output will be deleted/refreshed or simply appended depends on the process defined by the user. Falcon just reruns the process instance with the late-arriving input data. nice</description>
    <pubDate>Thu, 28 Apr 2016 03:52:42 GMT</pubDate>
    <dc:creator>tillmanstory</dc:creator>
    <dc:date>2016-04-28T03:52:42Z</dc:date>
    <item>
      <title>How does falcon handle late arriving data on target cluster?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117557#M26288</link>
      <description>&lt;P&gt;I have read apache falcon late arriving data documentation; however, I need to further understand how it is handles data on the target cluster side.  If late arriving data is detected (source cluster) is the data on the target location deleted/refreshd or simply appended?  &lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 04:22:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117557#M26288</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-04-27T04:22:38Z</dc:date>
    </item>
    <item>
      <title>Re: How does falcon handle late arriving data on target cluster?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117558#M26289</link>
      <description>&lt;P&gt;If the input data arrive late but within the cutoff time (defined in feed), Falcon will rerun the instance and update the output. If the input data arrive later than cutoff time, Falcon will not rerun but mark the instance as timeout.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 05:04:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117558#M26289</guid>
      <dc:creator>yzheng</dc:creator>
      <dc:date>2016-04-27T05:04:28Z</dc:date>
    </item>
    <item>
      <title>Re: How does falcon handle late arriving data on target cluster?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117559#M26290</link>
      <description>&lt;P&gt;Whether the output will be deleted/refreshed or simply appended depends on the process defined by the user. Falcon just reruns the process instance with the late-arriving input data.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 05:07:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117559#M26290</guid>
      <dc:creator>yzheng</dc:creator>
      <dc:date>2016-04-27T05:07:34Z</dc:date>
    </item>
    <item>
      <title>Re: How does falcon handle late arriving data on target cluster?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117560#M26291</link>
      <description>Whether the output will be deleted/refreshed or simply appended depends on the process defined by the user. Falcon just reruns the process instance with the late-arriving input data. nice</description>
      <pubDate>Thu, 28 Apr 2016 03:52:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117560#M26291</guid>
      <dc:creator>tillmanstory</dc:creator>
      <dc:date>2016-04-28T03:52:42Z</dc:date>
    </item>
    <item>
      <title>Re: How does falcon handle late arriving data on target cluster?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117561#M26292</link>
      <description>&lt;A rel="dofollow" href="http://www.nbc40.net/"&gt;Hortonworks&lt;/A&gt; awesome thanks</description>
      <pubDate>Thu, 28 Apr 2016 03:52:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117561#M26292</guid>
      <dc:creator>tillmanstory</dc:creator>
      <dc:date>2016-04-28T03:52:43Z</dc:date>
    </item>
    <item>
      <title>Re: How does falcon handle late arriving data on target cluster?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117562#M26293</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1486/smanjee.html" nodeid="1486"&gt;@Sunile Manjee&lt;/A&gt; &lt;/P&gt;&lt;P&gt;The supported policies for late data handling are:&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;backoff:&lt;/STRONG&gt; Take the maximum late cut-off and check every specified time.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;exp-backoff (default):&lt;/STRONG&gt; Recommended. Take the maximum cut-off date and check on an exponentially determined time.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;final:&lt;/STRONG&gt;Take the maximum late cut-off and check once.   &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;For example, a late cut-off of hours (8) means data can be delayed by up to 8 hours:&lt;/P&gt;&amp;lt;late-arrival cut-off="hours(6)”/&amp;gt;&lt;P&gt;The, late input in the following process specification is handled by the /apps/myapp/latehandle workflow:&lt;/P&gt;&lt;P&gt;&amp;lt;late-process policy="exp-backoff" delay="hours(2)”&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;lt;late-input input="input" workflow-path="&lt;STRONG&gt;/apps/myapp/latehandle&lt;/STRONG&gt;" /&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;lt;/late-process&amp;gt;&lt;/P&gt;&lt;P&gt;So this means that for 8 hours till feed arrives the workflow will be retried. Once the feed arrives within that window, the window will be reset.&lt;/P&gt;&lt;P&gt;Now inside /apps/myapp/latehandle you can put your own logic (It may be a sqoop/hive/shell etc etc). The processing  here will determine what will happen to that late feed. For simplified scenarios we can run the actual workflow or might modify for a special workflow which handles the dependencies and boundary cases.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 10:29:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-does-falcon-handle-late-arriving-data-on-target-cluster/m-p/117562#M26293</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-05-06T10:29:09Z</dc:date>
    </item>
  </channel>
</rss>

