<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pick Column Based on Index Number in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337379#M232559</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/94673"&gt;@sachin_32&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I guess this is coming as a CSV file, right?&lt;/P&gt;&lt;P&gt;You can achieve what you want with the following approach:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Configure your CSV Reader to ignore and skip the header line (if any)&lt;/LI&gt;&lt;LI&gt;Configure your CSV Read to use the following schema:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
  "type": "record",
  "name": "SensorReading",
  "namespace": "com.cloudera.example",
  "doc": "This is a sample sensor reading",
  "fields": [
    { "name": "c1", "type": "string" },
    { "name": "c2", "type": "string" },
    { "name": "c3", "type": "string" }
  ]
}​&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ensure you use a schema with the exact number of columns that your input file has.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In your QueryRecord you can then refer to the columns as c1, c2, etc...:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select c1, c2, c3
from flowfile​&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;André&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 01 Mar 2022 11:00:17 GMT</pubDate>
    <dc:creator>araujo</dc:creator>
    <dc:date>2022-03-01T11:00:17Z</dc:date>
    <item>
      <title>Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337361#M232556</link>
      <description>&lt;P&gt;I have the Flow File which has Duplicate Column i want to pick the Column threw index Number,&lt;BR /&gt;is it possible to do with Query Record or any Processor&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note: Column Will change with every new Flow File coming&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 09:53:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337361#M232556</guid>
      <dc:creator>sachin_32</dc:creator>
      <dc:date>2022-03-01T09:53:56Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337379#M232559</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/94673"&gt;@sachin_32&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I guess this is coming as a CSV file, right?&lt;/P&gt;&lt;P&gt;You can achieve what you want with the following approach:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Configure your CSV Reader to ignore and skip the header line (if any)&lt;/LI&gt;&lt;LI&gt;Configure your CSV Read to use the following schema:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
  "type": "record",
  "name": "SensorReading",
  "namespace": "com.cloudera.example",
  "doc": "This is a sample sensor reading",
  "fields": [
    { "name": "c1", "type": "string" },
    { "name": "c2", "type": "string" },
    { "name": "c3", "type": "string" }
  ]
}​&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Ensure you use a schema with the exact number of columns that your input file has.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In your QueryRecord you can then refer to the columns as c1, c2, etc...:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select c1, c2, c3
from flowfile​&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;André&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:00:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337379#M232559</guid>
      <dc:creator>araujo</dc:creator>
      <dc:date>2022-03-01T11:00:17Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337382#M232560</link>
      <description>&lt;P&gt;Thanks for your Suggestion&amp;nbsp; &amp;nbsp;but in this case i don't have any Exact Number of columns it will keep changing with incoming flow file it completely depends on the Flowfile And the scenario is i have few columns which can directly pick by giving the name of column&amp;nbsp; but for some column which is coming more than one for that i need to setup like indexing&amp;nbsp; and it's around 10-15 files which has this kind of issues so can you suggest for that ?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:15:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337382#M232560</guid>
      <dc:creator>sachin_32</dc:creator>
      <dc:date>2022-03-01T11:15:02Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337385#M232562</link>
      <description>&lt;P&gt;The number of columns in the schema doesn't actually need to be exact if you're happy to ignore the ones after the last one specified in the schema.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:32:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337385#M232562</guid>
      <dc:creator>araujo</dc:creator>
      <dc:date>2022-03-01T11:32:47Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337387#M232564</link>
      <description>&lt;P&gt;ok&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 11:56:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337387#M232564</guid>
      <dc:creator>sachin_32</dc:creator>
      <dc:date>2022-03-01T11:56:40Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337472#M232585</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/94673"&gt;@sachin_32&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here one different attempt. You can send your CSV flowfile to a ReplaceText processor with the following configuration:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="araujo_0-1646219751801.png" style="width: 606px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/33744i11F1F827BFD73C76/image-dimensions/606x297?v=v2" width="606" height="297" role="button" title="araujo_0-1646219751801.png" alt="araujo_0-1646219751801.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The Search Value is the following regular expression:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;(?s)^([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*)(.*$)&lt;/LI-CODE&gt;&lt;P&gt;And the Replacement Value is:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;$1,$2,$3,$4,col_a$6&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Each capture group&amp;nbsp;([^,\n]*) will match the name of one column. If you want to keep the name of that column you just replace it with $x, where x is the position of the column.&lt;/P&gt;&lt;P&gt;If you want to replace the column with another name, e.g. col_a, you just type the name of the new column name in the replacement instead.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The last capture group (.*), will match the remaining of the first line. This way you don't need to match every single column, only the ones up to the position you want to replace.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As an example, for this input:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;A,B,C,D,A
1,2,3,4,5
2,3,4,5,6&lt;/LI-CODE&gt;&lt;P&gt;The above replacement will generate this output:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;A,B,C,D,col_a
1,2,3,4,5
2,3,4,5,6&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;HTH,&lt;/P&gt;&lt;P&gt;André&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2022 11:21:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337472#M232585</guid>
      <dc:creator>araujo</dc:creator>
      <dc:date>2022-03-02T11:21:33Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337515#M232603</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/11191"&gt;@araujo&lt;/a&gt;&amp;nbsp; Thank you so much for your Help&amp;nbsp;&lt;BR /&gt;Last Question if I have my attribute like :-&lt;/P&gt;&lt;DIV class="attribute-name"&gt;INDEX&lt;/DIV&gt;&lt;DIV class="attribute-value"&gt;1,5,3,10&lt;BR /&gt;now what will be replacement value and in this case i have around&amp;nbsp;40 columns I want to rename only those which is present&amp;nbsp;in my index Attribute and i want my column like&amp;nbsp;&lt;BR /&gt;1,B,3,D,5---10,f,-- till 40&amp;nbsp;&lt;BR /&gt;is there any way so that it don't depend on my all column name it's just replace the name as per the element in my INDEX attribute as it is and keep all columns without changing name??&lt;/DIV&gt;</description>
      <pubDate>Wed, 02 Mar 2022 18:43:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337515#M232603</guid>
      <dc:creator>sachin_32</dc:creator>
      <dc:date>2022-03-02T18:43:12Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337530#M232606</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/26682"&gt;@Sachin&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's another attempt at this (hopefully the last one &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I created the attached example that gets a flowfile and aattribute INDEX as you described above.&lt;/P&gt;&lt;P&gt;It then uses an UpdateAttribute to convert the INDEX attribute into a FILTER that we can use in the QueryRecord processor.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The QueryRecord process uses a fixed schema that has 100 columns. It's ok if your CSV has less columns. If the CSV can have more than 100 columns you need to update the schema to the maximum of columns you expect to receive in any CSV.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output is a flowfile with the exact columns that were specified in the INDEX attribute.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Andre&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Mar 2022 00:36:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337530#M232606</guid>
      <dc:creator>araujo</dc:creator>
      <dc:date>2022-03-03T00:36:15Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337562#M232627</link>
      <description>&lt;P&gt;Here's the flow template for those who have older nifi versions&lt;/P&gt;</description>
      <pubDate>Thu, 03 Mar 2022 06:30:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337562#M232627</guid>
      <dc:creator>araujo</dc:creator>
      <dc:date>2022-03-03T06:30:19Z</dc:date>
    </item>
    <item>
      <title>Re: Pick Column Based on Index Number</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337569#M232632</link>
      <description>&lt;P&gt;Thanks for the Help &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Mar 2022 07:02:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pick-Column-Based-on-Index-Number/m-p/337569#M232632</guid>
      <dc:creator>sachin_32</dc:creator>
      <dc:date>2022-03-03T07:02:26Z</dc:date>
    </item>
  </channel>
</rss>

