<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Apache NiFi to split incoming data from a file based on condition into 3 flows in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Apache-NiFi-to-split-incoming-data-from-a-file-based-on/m-p/220284#M182169</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/75261/hksinghdec.html" nodeid="75261" target="_blank"&gt;@Hemu Singh&lt;/A&gt;&lt;/P&gt;&lt;P&gt;For this use case you need to use &lt;STRONG&gt;Query Record &lt;/STRONG&gt;processor and Based on the Record Reader controller services configured this processor will execute&lt;STRONG&gt; sql queries on the Flowfile &lt;/STRONG&gt;Contents and The &lt;STRONG&gt;result of the SQL query &lt;/STRONG&gt;then&lt;STRONG&gt; becomes&lt;/STRONG&gt; the content of the &lt;STRONG&gt;output FlowFile&lt;/STRONG&gt;&lt;STRONG&gt; &lt;/STRONG&gt;in the &lt;STRONG&gt;format as specified&lt;/STRONG&gt; in the &lt;STRONG&gt;Record Writer&lt;/STRONG&gt; controller service. &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Flow:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="68518-flow.png" style="width: 1867px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15901i4CF2592551BC98E3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="68518-flow.png" alt="68518-flow.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Flow Explanation:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;1.Generate Flowfile //added some test data&lt;/P&gt;&lt;P&gt;2.UpdateAttribute //added schema to the flowfile&lt;/P&gt;&lt;P&gt;3.Filter Column QueryRecord //&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;3.1.need to configure/enable &lt;STRONG&gt;Record Reader/Writer&lt;/STRONG&gt; controller services. &lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;3.2.added new property that will run sql where query on the flowfile&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="68519-filtercolumn.png" style="width: 1800px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15902i94211833D8617D84/image-size/medium?v=v2&amp;amp;px=400" role="button" title="68519-filtercolumn.png" alt="68519-filtercolumn.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;3.3.Use the original relation to store the file as is i.e having 100 records in it.&lt;BR /&gt;4.QueryRecord //&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;4.1.add two new properties that can run row_number window function(i'm having &lt;STRONG&gt;id column&lt;/STRONG&gt; in the flowfile) and get first 75 records in one relation&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;first 75 records&lt;/STRONG&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;PRE&gt;select * from (select *,Row_Number() over(order by id asc) as rn from FLOWFILE) r where r.rn &amp;lt;= 75&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;76 to 100 record
&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;select * from 
(select *,Row_Number() over(order by id asc) as rn from FLOWFILE) r where r.rn &amp;gt; 75 and r.rn &amp;lt;= 100 
&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="68520-queryrecord.png" style="width: 2520px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15903i7397D231B08728C0/image-size/medium?v=v2&amp;amp;px=400" role="button" title="68520-queryrecord.png" alt="68520-queryrecord.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Use the above two relations first75records and 76to100 record relationships for further processing.&lt;/P&gt;&lt;P&gt;In addition Query Record supports Limit offset..etc also so you can use either row_number/limit offset ..etc to get only the desired 75 records from the flowfile.&lt;/P&gt;&lt;P&gt;Please refer to &lt;A href="https://community.hortonworks.com/articles/121794/running-sql-on-flowfiles-using-queryrecord-process.html" target="_blank" rel="nofollow noopener noreferrer"&gt;this&lt;/A&gt; and &lt;A href="https://community.hortonworks.com/questions/178446/splitting-single-file-in-to-two-file-based-on-colu.html?childToView=178482#answer-178482" target="_blank" rel="nofollow noopener noreferrer"&gt;this&lt;/A&gt; for QueryRecord processor configs and usage.&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 02:30:30 GMT</pubDate>
    <dc:creator>Shu_ashu</dc:creator>
    <dc:date>2019-08-18T02:30:30Z</dc:date>
  </channel>
</rss>

