<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to process corrupted CSV data with NiFi in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398183#M250124</link>
    <description>&lt;P&gt;&lt;SPAN&gt;My NiFi flow fails when encountering a CSV with a column containing a double quote within a string&lt;/SPAN&gt;&lt;SPAN&gt;, such as&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;BR /&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN&gt;"Protection from Abuse Order on file against dad Raul Martinez Lopez&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;" NO CONTACT WITH DAD&lt;/SPAN&gt;&lt;SPAN&gt;. 12&lt;/SPAN&gt;&lt;SPAN&gt;/16&lt;/SPAN&gt;&lt;SPAN&gt;/2014 kb&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;The error is occurring at the Record Reader stage&lt;/SPAN&gt;&lt;SPAN&gt;. Has anyone else successfully handled CSV data with embedded double quotes&lt;/SPAN&gt;&lt;SPAN&gt;?&lt;BR /&gt;From csv:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DuyChan_0-1732787817821.png"&gt;&lt;img src="https://community.cloudera.com/skins/images/D06D7978C9C8DDCE2D280E24398D4568/responsive_peak/images/image_not_found.png" alt="DuyChan_0-1732787817821.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;My record reader config&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DuyChan_1-1732787893026.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42833i1B4A34C04A61D33E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DuyChan_1-1732787893026.png" alt="DuyChan_1-1732787893026.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 28 Nov 2024 09:58:48 GMT</pubDate>
    <dc:creator>DuyChan</dc:creator>
    <dc:date>2024-11-28T09:58:48Z</dc:date>
    <item>
      <title>How to process corrupted CSV data with NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398183#M250124</link>
      <description>&lt;P&gt;&lt;SPAN&gt;My NiFi flow fails when encountering a CSV with a column containing a double quote within a string&lt;/SPAN&gt;&lt;SPAN&gt;, such as&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;BR /&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN&gt;"Protection from Abuse Order on file against dad Raul Martinez Lopez&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;" NO CONTACT WITH DAD&lt;/SPAN&gt;&lt;SPAN&gt;. 12&lt;/SPAN&gt;&lt;SPAN&gt;/16&lt;/SPAN&gt;&lt;SPAN&gt;/2014 kb&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;The error is occurring at the Record Reader stage&lt;/SPAN&gt;&lt;SPAN&gt;. Has anyone else successfully handled CSV data with embedded double quotes&lt;/SPAN&gt;&lt;SPAN&gt;?&lt;BR /&gt;From csv:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DuyChan_0-1732787817821.png"&gt;&lt;img src="https://community.cloudera.com/skins/images/D06D7978C9C8DDCE2D280E24398D4568/responsive_peak/images/image_not_found.png" alt="DuyChan_0-1732787817821.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;My record reader config&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DuyChan_1-1732787893026.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42833i1B4A34C04A61D33E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="DuyChan_1-1732787893026.png" alt="DuyChan_1-1732787893026.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Nov 2024 09:58:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398183#M250124</guid>
      <dc:creator>DuyChan</dc:creator>
      <dc:date>2024-11-28T09:58:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to process corrupted CSV data with NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398195#M250128</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;First, if the data you have posted contain real personal info I would recommend to remove and use some dummy data instead. Its violation of community guidelines to post personal information (see point 7 of &lt;A href="https://community.cloudera.com/t5/custom/page/page-id/Community_Guidelines" target="_self"&gt;community guidelines&lt;/A&gt;).&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In regards to the error: you are getting it because of the property setting&lt;STRONG&gt; Quote Character = "&amp;nbsp;&lt;/STRONG&gt; in the &lt;STRONG&gt;CSVReader&lt;/STRONG&gt; service. What this setting means is that when you have sentence that has once of the reserved CSV characters&amp;nbsp; like comma (,) as column separator and&amp;nbsp; new line (\n) to separate records&amp;nbsp; where you dont\cant use the escape character (\), then you can surround the whole column value with double quotes at both ends. &lt;STRONG&gt;This means you should not have any following character for the same column. For more info please refer to&lt;/STRONG&gt; :&amp;nbsp; &lt;A href="https://csv-loader.com/csv-guide/why-quotation-marks-are-used-in-csv" target="_blank"&gt;https://csv-loader.com/csv-guide/why-quotation-marks-are-used-in-csv&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Since the line you have listed has following characters after the closing " , you are getting&amp;nbsp;&amp;nbsp; the illegal character error.&lt;/P&gt;&lt;P&gt;To Resolve:&lt;/P&gt;&lt;P&gt;You have two options:&lt;/P&gt;&lt;P&gt;1- Use Replace Text to replace any double quote " character with \" to escape the double quote. However this might not be so efficient if you have large CSV file.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_0-1732802227968.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42839iF408A9CC1662A367/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_0-1732802227968.png" alt="SAMSAL_0-1732802227968.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;2- More efficient option, is to replace the &lt;STRONG&gt;Quote Character&lt;/STRONG&gt; in the &lt;STRONG&gt;CSVReader&lt;/STRONG&gt; with something other than &lt;STRONG&gt;"&lt;/STRONG&gt; , however you have to make sure that your data is not going to contain the new character in any of the CSV values. Possible options: $,%,^&lt;/P&gt;&lt;P&gt;If this helps please accept the solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Nov 2024 14:07:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398195#M250128</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-11-28T14:07:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to process corrupted CSV data with NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398219#M250135</link>
      <description>&lt;P&gt;I tried to delete the data you mentioned, but I don't know how to edit the topic. Thank you very much for your support.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 04:27:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-process-corrupted-CSV-data-with-NiFi/m-p/398219#M250135</guid>
      <dc:creator>DuyChan</dc:creator>
      <dc:date>2024-11-29T04:27:24Z</dc:date>
    </item>
  </channel>
</rss>

