<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Use column values of a csv file to route flow files in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286601#M212543</link>
    <description>&lt;P&gt;thanks a lot. it worked with the new regex code.&lt;/P&gt;</description>
    <pubDate>Mon, 30 Dec 2019 20:39:55 GMT</pubDate>
    <dc:creator>CJoe36</dc:creator>
    <dc:date>2019-12-30T20:39:55Z</dc:date>
    <item>
      <title>Use column values of a csv file to route flow files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286526#M212494</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I would like to route flow files&amp;nbsp; based on a column's value of a csv file. i have splitted a csv file into many csv files with the content: header and 1 row of the dataframe. so if the inital csv file consisted of 50 rows i will have 50 new csv files. i did this with the SplitRecord processor and a csv-reader/writer. The i have a UpdateAttribute processor to a UUID to these 50 files. Now, i would like to route those files based on a column's value. is this possible?&lt;/P&gt;
&lt;P&gt;Let's say column number 5 has got numeric values and want to route all flow files with a value bigger than 50. How can i achieve that?&lt;/P&gt;
&lt;P&gt;Thank you in advance for your help and advice&lt;/P&gt;</description>
      <pubDate>Sun, 29 Dec 2019 01:03:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286526#M212494</guid>
      <dc:creator>CJoe36</dc:creator>
      <dc:date>2019-12-29T01:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: Use column values of a csv file to route flow files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286537#M212500</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/72716"&gt;@CJoe36&lt;/a&gt;&amp;nbsp;couple of things you need to do here:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Get the value of the column into an attribute.&lt;/LI&gt;&lt;LI&gt;Create routes based on the value of attribute.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For #1 there are a few ways. &amp;nbsp;First, you can create a flow that uses a CSV Reader and a known schema. &amp;nbsp;Using this you can translate and parse the columns. This requires multiple processors and CSVReader Controller Service. &amp;nbsp; &amp;nbsp;Two, you can just use ExtractText with regex (one processor).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;here is example of ExtractText to get a Quantity and SKU from an inventory CSV. &amp;nbsp;Notice the regex codes used with the commas, and the () indicating which field maps to the attribute defined (sku or qty).&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2019-12-29 at 9.32.28 AM.png" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25862i1795E4034183B427/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2019-12-29 at 9.32.28 AM.png" alt="Screen Shot 2019-12-29 at 9.32.28 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;qty:&lt;/P&gt;&lt;P&gt;.*?,,.*?,,(\d+)$&lt;/P&gt;&lt;P&gt;sku:&lt;/P&gt;&lt;P&gt;(.*?),,.*?,,.*?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If your CSV is 10 columns, and #5 is your value, you probably want something like this:&lt;/P&gt;&lt;P&gt;,,,,(.*?),,,,,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For #2 you want to use RouteOnAttribute Processor with routes defined using &lt;A href="https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html" target="_blank" rel="noopener"&gt;NiFi Expression Language&lt;/A&gt;. &amp;nbsp;You define a route, then when defined you can chose it for downstream of RouteOnAttribute. &amp;nbsp;Anything else will go to unmatched.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is an example:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2019-12-29 at 9.38.19 AM.png" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25863i8879B18666C6E2AE/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2019-12-29 at 9.38.19 AM.png" alt="Screen Shot 2019-12-29 at 9.38.19 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And Routed:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2019-12-29 at 9.40.32 AM.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25865i2EABA266D2F4AC15/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screen Shot 2019-12-29 at 9.40.32 AM.png" alt="Screen Shot 2019-12-29 at 9.40.32 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If this reply helps answer your question, please mark it as a Solution.&lt;/P&gt;</description>
      <pubDate>Sun, 29 Dec 2019 14:46:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286537#M212500</guid>
      <dc:creator>stevenmatison</dc:creator>
      <dc:date>2019-12-29T14:46:40Z</dc:date>
    </item>
    <item>
      <title>Re: Use column values of a csv file to route flow files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286547#M212504</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/60150"&gt;@stevenmatison&lt;/a&gt;&amp;nbsp;Thanks a lot for your reply. very much appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried it but got stuck in the ExtractText processor.&lt;/P&gt;&lt;P&gt;My real file consists of 13 columns and #4 is the relevant one with the needed value. The input Flow File that feeds the ExtractText processor looks like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="InputForExtractText.png" style="width: 733px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25871i0947BE02F9648448/image-dimensions/733x50?v=v2" width="733" height="50" role="button" title="InputForExtractText.png" alt="InputForExtractText.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;#4 is value 25.0 (before "cpm")&lt;/P&gt;&lt;P&gt;Meanwhile i removed the headers since i was not sure if that has got an impact or not.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So what i did next in ExtractText processor is creating a new property named Value:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ExtractText.png" style="width: 777px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25870iE836099F1E0AAD8E/image-size/large?v=v2&amp;amp;px=999" role="button" title="ExtractText.png" alt="ExtractText.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="5"&gt;,,,(.*?),,,,,,,,,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But when i run the workflow all the flow files goes unmatched off the ExtractText processor. So there are no matched ones that go to RouteOnAttribute.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I really don't know what went wrong here.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you &amp;amp; Regards&lt;/P&gt;</description>
      <pubDate>Sun, 29 Dec 2019 23:34:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286547#M212504</guid>
      <dc:creator>CJoe36</dc:creator>
      <dc:date>2019-12-29T23:34:07Z</dc:date>
    </item>
    <item>
      <title>Re: Use column values of a csv file to route flow files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286585#M212532</link>
      <description>&lt;P&gt;Not working in your Use Case, I tried to show as much of mine as I could, &amp;nbsp;I think you understand the concept. &amp;nbsp;So the next trick is just getting the REGEX matched to your string. &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Try a tool here:&amp;nbsp;&lt;A href="https://regex101.com" target="_self"&gt;https://regex101.com&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In this tool I quickly matched REGEX to 4th column:&lt;/P&gt;&lt;P&gt;.*?,.*?,.*?,(.*?),.*&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;to:&lt;/P&gt;&lt;P&gt;test,test,test,25.0,test,test,test&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For your example, the above should work. &amp;nbsp; It is not necessary to parse past the column you need, so the .* on end should pickup entire rest of the line.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2019 14:23:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286585#M212532</guid>
      <dc:creator>stevenmatison</dc:creator>
      <dc:date>2019-12-30T14:23:39Z</dc:date>
    </item>
    <item>
      <title>Re: Use column values of a csv file to route flow files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286601#M212543</link>
      <description>&lt;P&gt;thanks a lot. it worked with the new regex code.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2019 20:39:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-column-values-of-a-csv-file-to-route-flow-files/m-p/286601#M212543</guid>
      <dc:creator>CJoe36</dc:creator>
      <dc:date>2019-12-30T20:39:55Z</dc:date>
    </item>
  </channel>
</rss>

