<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Using Nifi to do processing on CSV file before inserting into database in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175225#M73058</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/2337/kvasko.html" nodeid="2337" target="_blank"&gt;@Kevin Vasko&lt;/A&gt;&lt;P&gt;If you know &lt;STRONG&gt;13 columns before the string&lt;/STRONG&gt; then you can use Replace Text processor that matches &lt;STRONG&gt;until 13th comma&lt;/STRONG&gt; then match everything after &lt;STRONG&gt;13th comma until last comma&lt;/STRONG&gt;(not including last comma).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replace text processor Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Change the below properties as&lt;/P&gt;&lt;P&gt;Search Value
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;^((?:[^,]*,){13})(.*), //capture group 1 until 13th comma and capture group 2 after 13th comma until last comma&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Replacement Value
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;$1"$2" //keep 1 group as is and add double quotes for 2 group&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Maximum Buffer Size&lt;/P&gt;&lt;PRE&gt;1 MB //if your flowfile content is more than 1 MB then you need to change this value.&lt;/PRE&gt;&lt;P&gt;Replacement Strategy
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Regex Replace&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Evaluation Mode
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Line-by-Line&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="47438-replacetext.png" style="width: 1727px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19822iDDDFC6B8FDB5B018/image-size/medium?v=v2&amp;amp;px=400" role="button" title="47438-replacetext.png" alt="47438-replacetext.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;By using above search value property we are just replacing as is until 13th comma and matching whole content after 13th comma until last comma and enclosing whole content in double quotes(").&lt;/P&gt;&lt;P&gt;If you are thinking that &lt;STRONG&gt;regex will blow up&lt;/STRONG&gt; by using above replace text processor configurations then you can use &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Split Text processor&lt;/STRONG&gt; before Replace Text processor to split big &lt;STRONG&gt;csv file&lt;/STRONG&gt; each &lt;STRONG&gt;flowfile having 1 line&lt;/STRONG&gt;,Then use &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Replace Text Processor&lt;/STRONG&gt; will work with 1 small flowfile, After replace text you can use&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Merge Content Processor&lt;/STRONG&gt;(if you want to merge small flowfiles into 1 big file again).&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 10:12:12 GMT</pubDate>
    <dc:creator>Shu_ashu</dc:creator>
    <dc:date>2019-08-18T10:12:12Z</dc:date>
    <item>
      <title>Using Nifi to do processing on CSV file before inserting into database</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175222#M73055</link>
      <description>&lt;P&gt;I have a CSV file that is messy.&lt;/P&gt;&lt;P&gt;I need to:&lt;/P&gt;&lt;P&gt;1. Get the date from the filename and use that as my date and append that to one of the columns.&lt;/P&gt;&lt;P&gt;2. Parse the CSV file to get the columns as the very last column is a string which has separators in the string ",". &lt;/P&gt;&lt;P&gt;The data looks like this.&lt;/P&gt;&lt;P&gt;Filename: ExampleFile_2017-09-20.LOG&lt;/P&gt;&lt;P&gt;Content:&lt;/P&gt;&lt;P&gt;23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max attempts (1=&amp;gt;3), fit failing entirely (Fit Failure=True),&lt;BR /&gt;23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,Command Measure, Targets complete - Elapsed: 76064 ms,&lt;/P&gt;&lt;P&gt;The following is what will need to be inserted into the database:&lt;/P&gt;&lt;P&gt;2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max 
attempts (1=&amp;gt;3), fit failing entirely (Fit Failure=True)"&lt;/P&gt;&lt;P&gt;2017-09-20 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,"Command Measure, Targets complete - Elapsed: 76064 ms"&lt;/P&gt;&lt;P&gt;Would I need to do this inside of NiFi or some external script by calling some type of ExecuteScript?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2018 00:46:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175222#M73055</guid>
      <dc:creator>kvasko</dc:creator>
      <dc:date>2018-01-04T00:46:37Z</dc:date>
    </item>
    <item>
      <title>Re: Using Nifi to do processing on CSV file before inserting into database</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175223#M73056</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2337/kvasko.html" nodeid="2337" target="_blank"&gt;@Kevin Vasko&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;Yes we can do your case using NiFi Processors without using any external scripts.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Here is what i tried:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;First you need to extract the date from filename and keep it as attribute to the flowfile by using&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Update Attribute processor:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;add new property &lt;/P&gt;&lt;P&gt;date&lt;/P&gt;&lt;PRE&gt;${filename:substringAfter('_'):substringBefore('.')}&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="47430-update.png" style="width: 1965px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19823i00D9239CA1676C5C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="47430-update.png" alt="47430-update.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Then use &lt;STRONG&gt;Replace text processor&lt;/STRONG&gt; to prepend the existing flowfile csv data with &lt;STRONG&gt;extracted date attribute&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Change the &lt;/P&gt;&lt;P&gt;Replacement Value
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;${date}&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;**keep space after ${date} in above.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Maximum Buffer Size
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;1 MB //if your flowfile content is more than 1 MB then you need to change this value.&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Replacement Strategy
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Prepend&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Evaluation Mode
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Line-by-Line&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;Configs:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="47431-replacetext.png" style="width: 1615px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19824iEBBC02261FD12633/image-size/medium?v=v2&amp;amp;px=400" role="button" title="47431-replacetext.png" alt="47431-replacetext.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So in this processor we are just prepending the data with date attribute that is extracted in Update attribute processor and we are going to prepend date value Line-by-Line evoluation mode.&lt;/P&gt;&lt;P&gt;Then use another &lt;STRONG&gt;replace text processor&lt;/STRONG&gt; to replace last , with double quotes "&lt;/P&gt;&lt;P&gt;Search Value&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;(.*),(.*),(.*),&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Replacement Value&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;$1,"$2,$3"&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Maximum Buffer Size&lt;/P&gt;&lt;PRE&gt;1 MB //if your flowfile content is more than 1 MB then you need to change this value.&lt;/PRE&gt;&lt;P&gt;Replacement Strategy&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Regex Replace&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Evaluation Mode&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Line-by-Line&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="47432-replacetext.png" style="width: 1657px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19825iD26BD954973EB5CE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="47432-replacetext.png" alt="47432-replacetext.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So in this processor we are extracting all the data between last 2 commas and then replacing first group as is and enclosing group2 and group3 with double quotes and removing , at last.&lt;/P&gt;&lt;P&gt;Output:-&lt;/P&gt;&lt;PRE&gt;2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max attempts (1=&amp;gt;3), fit failing entirely (Fit Failure=True)"
2017-09-20 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,"Command Measure, Targets complete - Elapsed: 76064 ms" &lt;/PRE&gt;&lt;P&gt;I have attached the xml file you can use the same xml file and change the configs as per your needs.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Flow:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="47433-flow.png" style="width: 2230px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19826i981AFAEB91E64238/image-size/medium?v=v2&amp;amp;px=400" role="button" title="47433-flow.png" alt="47433-flow.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Reference xml:-&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/47434-date-155661.xml" target="_blank"&gt;date-155661.xml&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Let me know if you have any issues..!!&lt;/P&gt;&lt;P&gt;If the Answer helped to resolve your issue, &lt;STRONG&gt;Click on Accept button below to accept the answer,&lt;/STRONG&gt; That would be great help to Community users to find solution quickly for these kind of errors.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 10:12:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175223#M73056</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-08-18T10:12:38Z</dc:date>
    </item>
    <item>
      <title>Re: Using Nifi to do processing on CSV file before inserting into database</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175224#M73057</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/18929/yaswanthmuppireddy.html"&gt;@Shu&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thank you for the great detailed response. &lt;/P&gt;&lt;P&gt;The first part does work but I don't think the regex will work for my
 case. &lt;/P&gt;&lt;P&gt;(Side bit, no fault of yours, I just absolutely despise regex as its 
unreadable to me and extremely difficult to debug (if at all).)&lt;/P&gt;&lt;P&gt;I should have mentioned this, but the only thing I know about the CSV
 file is that there are X number of columns before the string. &lt;/P&gt;&lt;P&gt;So I could see something like..&lt;/P&gt;
&lt;PRE&gt;23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max, attempts,(1=&amp;gt;3), fit failing entrely,(FitFailure=True),&lt;/PRE&gt;&lt;OL&gt;
&lt;/OL&gt;&lt;P&gt;The only thing I know is that there are 13 columns (commas) before 
the string and the string will always have a trailing "," (It has always
 been the last column in the row from what I have seen).&lt;/P&gt;&lt;P&gt;The other issue is I tried doing &lt;/P&gt;
&lt;PRE&gt;(.*),&lt;/PRE&gt;&lt;OL&gt;
&lt;/OL&gt;&lt;P&gt;for all of the columns so I could then put it into a database query 
to insert the data but the regex seems to blowup and not function with 
so many columns (the original data has about 150 columns in it and I 
just truncated it down here).&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2018 06:25:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175224#M73057</guid>
      <dc:creator>kvasko</dc:creator>
      <dc:date>2018-01-04T06:25:06Z</dc:date>
    </item>
    <item>
      <title>Re: Using Nifi to do processing on CSV file before inserting into database</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175225#M73058</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/2337/kvasko.html" nodeid="2337" target="_blank"&gt;@Kevin Vasko&lt;/A&gt;&lt;P&gt;If you know &lt;STRONG&gt;13 columns before the string&lt;/STRONG&gt; then you can use Replace Text processor that matches &lt;STRONG&gt;until 13th comma&lt;/STRONG&gt; then match everything after &lt;STRONG&gt;13th comma until last comma&lt;/STRONG&gt;(not including last comma).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replace text processor Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Change the below properties as&lt;/P&gt;&lt;P&gt;Search Value
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;^((?:[^,]*,){13})(.*), //capture group 1 until 13th comma and capture group 2 after 13th comma until last comma&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Replacement Value
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;$1"$2" //keep 1 group as is and add double quotes for 2 group&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Maximum Buffer Size&lt;/P&gt;&lt;PRE&gt;1 MB //if your flowfile content is more than 1 MB then you need to change this value.&lt;/PRE&gt;&lt;P&gt;Replacement Strategy
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Regex Replace&lt;/PRE&gt;
&lt;/DIV&gt;&lt;P&gt;Evaluation Mode
&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;Line-by-Line&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Configs:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="47438-replacetext.png" style="width: 1727px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/19822iDDDFC6B8FDB5B018/image-size/medium?v=v2&amp;amp;px=400" role="button" title="47438-replacetext.png" alt="47438-replacetext.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;By using above search value property we are just replacing as is until 13th comma and matching whole content after 13th comma until last comma and enclosing whole content in double quotes(").&lt;/P&gt;&lt;P&gt;If you are thinking that &lt;STRONG&gt;regex will blow up&lt;/STRONG&gt; by using above replace text processor configurations then you can use &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Split Text processor&lt;/STRONG&gt; before Replace Text processor to split big &lt;STRONG&gt;csv file&lt;/STRONG&gt; each &lt;STRONG&gt;flowfile having 1 line&lt;/STRONG&gt;,Then use &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Replace Text Processor&lt;/STRONG&gt; will work with 1 small flowfile, After replace text you can use&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Merge Content Processor&lt;/STRONG&gt;(if you want to merge small flowfiles into 1 big file again).&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 10:12:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175225#M73058</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2019-08-18T10:12:12Z</dc:date>
    </item>
    <item>
      <title>Re: Using Nifi to do processing on CSV file before inserting into database</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175226#M73059</link>
      <description>&lt;P&gt;Thanks! That seems to work correctly. I'll mark this as the answer as it produces the answer I'm looking for.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jan 2018 01:25:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Using-Nifi-to-do-processing-on-CSV-file-before-inserting/m-p/175226#M73059</guid>
      <dc:creator>kvasko</dc:creator>
      <dc:date>2018-01-05T01:25:35Z</dc:date>
    </item>
  </channel>
</rss>

