<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399059#M250383</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I'm using the CSV readers and writers in Query Record processor and CSVtoJSON Record Converter processor.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am expecting around 10 types of CSV files(I mean to say, files with different columns). But this number might increase in the future upto 50. Even though I know what will these 50 files exactly contain, I'll still have to write all the 50 schemas in a schema registry. Which is complex to maintain and work with.&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I thought of this method in which I'll append all the values in a csv file with double quotes such that the reader and writer will consider it as a string. But it still continues to read it as an integer&lt;/P&gt;&lt;P&gt;Input File:&lt;/P&gt;&lt;P&gt;col1,col2,col3,col4,col5&lt;BR /&gt;999,C10,100,010,0&lt;BR /&gt;999,C06,10,010,0&lt;/P&gt;&lt;P&gt;This is what the CSV writer is giving:&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Screenshot 2024-12-19 104340.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43134iF753D0B176AD58C8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-12-19 104340.png" alt="Screenshot 2024-12-19 104340.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Screenshot 2024-12-19 104409.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43133i0B9A315B4C67CD08/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-12-19 104409.png" alt="Screenshot 2024-12-19 104409.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;DIV class="attribute-name"&gt;avro.schema&lt;/DIV&gt;&lt;DIV class="attribute-value"&gt;{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"col1","type":["int","null"]},{"name":"col2","type":["int","null"]},{"name":"col3","type":["string","null"]},{"name":"col4","type":["int","null"]},{"name":"col5","type":["int","null"]}]}&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 19 Dec 2024 05:16:18 GMT</pubDate>
    <dc:creator>phadkev</dc:creator>
    <dc:date>2024-12-19T05:16:18Z</dc:date>
    <item>
      <title>CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399018#M250374</link>
      <description>&lt;P&gt;I am using CSVReader and CSVRecordSetWriter using the Infer schema setting. But when I have values such as "030", even though the all the values has double quotes enclosed. When I write the schema into avro.schema, I see it is considered as type 'int'. But I want to treat it as string. Because of this our output after the processor looks like 30 and the first 0 is omitted. I want to use Infer schema property only because, I want to read the csv files dynamically without hardcoding the schema and use that.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 09:53:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399018#M250374</guid>
      <dc:creator>phadkev</dc:creator>
      <dc:date>2024-12-18T09:53:48Z</dc:date>
    </item>
    <item>
      <title>Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399027#M250376</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/121295"&gt;@phadkev&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;I'm not sure if there is a way around this without using Schema. You mentioned that you want to read the CSV files dynamically but does that mean you dont know what kind of CSV data you will get ? if so how are you parsing the data after conversion? what processor are you using the CSVReader &amp;amp; Write in ? maybe you can elaborate more on what you are trying to do with the data beginning to end to see if we can help in another way.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 15:44:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399027#M250376</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-12-18T15:44:28Z</dc:date>
    </item>
    <item>
      <title>Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399059#M250383</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I'm using the CSV readers and writers in Query Record processor and CSVtoJSON Record Converter processor.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am expecting around 10 types of CSV files(I mean to say, files with different columns). But this number might increase in the future upto 50. Even though I know what will these 50 files exactly contain, I'll still have to write all the 50 schemas in a schema registry. Which is complex to maintain and work with.&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I thought of this method in which I'll append all the values in a csv file with double quotes such that the reader and writer will consider it as a string. But it still continues to read it as an integer&lt;/P&gt;&lt;P&gt;Input File:&lt;/P&gt;&lt;P&gt;col1,col2,col3,col4,col5&lt;BR /&gt;999,C10,100,010,0&lt;BR /&gt;999,C06,10,010,0&lt;/P&gt;&lt;P&gt;This is what the CSV writer is giving:&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Screenshot 2024-12-19 104340.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43134iF753D0B176AD58C8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-12-19 104340.png" alt="Screenshot 2024-12-19 104340.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Screenshot 2024-12-19 104409.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43133i0B9A315B4C67CD08/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-12-19 104409.png" alt="Screenshot 2024-12-19 104409.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;DIV class="attribute-name"&gt;avro.schema&lt;/DIV&gt;&lt;DIV class="attribute-value"&gt;{"type":"record","name":"nifiRecord","namespace":"org.apache.nifi","fields":[{"name":"col1","type":["int","null"]},{"name":"col2","type":["int","null"]},{"name":"col3","type":["string","null"]},{"name":"col4","type":["int","null"]},{"name":"col5","type":["int","null"]}]}&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 05:16:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399059#M250383</guid>
      <dc:creator>phadkev</dc:creator>
      <dc:date>2024-12-19T05:16:18Z</dc:date>
    </item>
    <item>
      <title>Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399144#M250421</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/121295"&gt;@phadkev&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;Since you dont care if all the values come as string , then I would suggest doing the following:&lt;/P&gt;&lt;P&gt;1- Use the &lt;STRONG&gt;ExtractRecordSchema&lt;/STRONG&gt; processor to generate the &lt;STRONG&gt;avro.schema&lt;/STRONG&gt; attribute with the record schema as inferred by Nifi (Available 1.26+ version ) as follows:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_0-1734618655964.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43143iB44013B933F1E1AD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_0-1734618655964.png" alt="SAMSAL_0-1734618655964.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The &lt;STRONG&gt;CSVReader&lt;/STRONG&gt; for this processor will use i&lt;STRONG&gt;nfer schema strategy.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;once you pass your CSV input through this processor you will have new flowfile attribute &lt;STRONG&gt;avro.schema&lt;/STRONG&gt; with the following value&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_1-1734618903447.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43144i723F1B2077933EC3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_1-1734618903447.png" alt="SAMSAL_1-1734618903447.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;As Expected some of the values like name are assigned &lt;STRONG&gt;int&lt;/STRONG&gt; type which we will take care of in the next step.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2- Use &lt;STRONG&gt;UpdateAttribute&lt;/STRONG&gt; to replace any &lt;STRONG&gt;int&lt;/STRONG&gt; type with &lt;STRONG&gt;string&lt;/STRONG&gt; type inside the &lt;STRONG&gt;avro.schema&lt;/STRONG&gt; attribute as follows:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_3-1734619084414.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43146iE67F2EE67FA576DB/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_3-1734619084414.png" alt="SAMSAL_3-1734619084414.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The Expression Language used to re set the &lt;STRONG&gt;avro.schme&lt;/STRONG&gt; attribute:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;${avro.schema:replace("int","string")}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3- Use the &lt;STRONG&gt;QueryRecord&lt;/STRONG&gt; with a &lt;U&gt;&lt;STRONG&gt;different CSVReader&lt;/STRONG&gt;&lt;/U&gt; from step 1&amp;nbsp; where this one uses the "&lt;STRONG&gt;use Schema Text Property&lt;/STRONG&gt;" . Notice how by default the &lt;STRONG&gt;Shema Text property&lt;/STRONG&gt; is set to the &lt;STRONG&gt;avro.schema&lt;/STRONG&gt; attribute, which we generated from step 1&amp;amp;2:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_4-1734619473281.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43147iCC41770E72565D2A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_4-1734619473281.png" alt="SAMSAL_4-1734619473281.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Also make sure you &lt;STRONG&gt;set the same strategy for the CSVRecordWriter&lt;/STRONG&gt; to ensure that the read and written CSV will be in&amp;nbsp; the desired format.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope that helps. If it does, please accept the solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 14:50:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399144#M250421</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-12-19T14:50:14Z</dc:date>
    </item>
    <item>
      <title>Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399182#M250434</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks this did work.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2024 06:14:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399182#M250434</guid>
      <dc:creator>phadkev</dc:creator>
      <dc:date>2024-12-20T06:14:56Z</dc:date>
    </item>
    <item>
      <title>Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399200#M250441</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/121295"&gt;@phadkev&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;Im glad its working for you. Something I forgot to mention is that this will perform OK if your CSV dataset is small in size. Im not sure how inferring schema works but I imagine it does full scan on data to determine the appropriate type and if this the case this operation will be costly with large data. &lt;STRONG&gt;That is why its always recommend to know your schema and pass it when working with records.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Having said this , I was concerned about the performance of my suggestion above so I did more research and found that actually &lt;STRONG&gt;there is an easier way to convert everything to string in much faster way where you&amp;nbsp; don't need the extra processors.&lt;/STRONG&gt; The suggestion is based on this &lt;A href="https://stackoverflow.com/questions/72830470/how-do-i-configure-a-nifi-schema-to-convert-all-properties-to-strings-when-conve" target="_self"&gt;post.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Basically, all you have to do in the &lt;STRONG&gt;CSVReader&lt;/STRONG&gt; is to set the &lt;STRONG&gt;Access Schema Strategy&lt;/STRONG&gt; to "&lt;STRONG&gt;Use String Fields From Header&lt;/STRONG&gt;".&lt;/P&gt;&lt;P&gt;The &lt;STRONG&gt;CSVRecordWriter&lt;/STRONG&gt; &lt;STRONG&gt;Access Schema Strategy&lt;/STRONG&gt; should be set to "&lt;STRONG&gt;Inherit Record Schema&lt;/STRONG&gt;"&lt;/P&gt;&lt;P&gt;That should do it. give it try and see how it goes.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2024 13:49:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399200#M250441</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-12-20T13:49:17Z</dc:date>
    </item>
    <item>
      <title>Re: CSVReader and CSVRecordSetWriter doesn't consider interger values present in double quotes as string</title>
      <link>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399203#M250442</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;, This is indeed more efficient. Thanks for letting me know.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Dec 2024 15:26:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/CSVReader-and-CSVRecordSetWriter-doesn-t-consider-interger/m-p/399203#M250442</guid>
      <dc:creator>phadkev</dc:creator>
      <dc:date>2024-12-20T15:26:10Z</dc:date>
    </item>
  </channel>
</rss>

