Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pick Column Based on Index Number

avatar
Contributor

I have the Flow File which has Duplicate Column i want to pick the Column threw index Number,
is it possible to do with Query Record or any Processor 

Note: Column Will change with every new Flow File coming 

1 ACCEPTED SOLUTION

avatar
Super Guru

Here's the flow template for those who have older nifi versions

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

9 REPLIES 9

avatar
Super Guru

Hi, @sachin_32 ,

 

I guess this is coming as a CSV file, right?

You can achieve what you want with the following approach:

 

  • Configure your CSV Reader to ignore and skip the header line (if any)
  • Configure your CSV Read to use the following schema:

 

{
  "type": "record",
  "name": "SensorReading",
  "namespace": "com.cloudera.example",
  "doc": "This is a sample sensor reading",
  "fields": [
    { "name": "c1", "type": "string" },
    { "name": "c2", "type": "string" },
    { "name": "c3", "type": "string" }
  ]
}​

 

 

Ensure you use a schema with the exact number of columns that your input file has.

 

  • In your QueryRecord you can then refer to the columns as c1, c2, etc...:

 

select c1, c2, c3
from flowfile​

 

 

 

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Contributor

Thanks for your Suggestion   but in this case i don't have any Exact Number of columns it will keep changing with incoming flow file it completely depends on the Flowfile And the scenario is i have few columns which can directly pick by giving the name of column  but for some column which is coming more than one for that i need to setup like indexing  and it's around 10-15 files which has this kind of issues so can you suggest for that ?

avatar
Super Guru

The number of columns in the schema doesn't actually need to be exact if you're happy to ignore the ones after the last one specified in the schema.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Contributor

ok

avatar
Super Guru

@sachin_32 ,

 

Here one different attempt. You can send your CSV flowfile to a ReplaceText processor with the following configuration:

araujo_0-1646219751801.png

The Search Value is the following regular expression:

(?s)^([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*)(.*$)

And the Replacement Value is:

$1,$2,$3,$4,col_a$6

 

Each capture group ([^,\n]*) will match the name of one column. If you want to keep the name of that column you just replace it with $x, where x is the position of the column.

If you want to replace the column with another name, e.g. col_a, you just type the name of the new column name in the replacement instead.

 

The last capture group (.*), will match the remaining of the first line. This way you don't need to match every single column, only the ones up to the position you want to replace.

 

As an example, for this input:

A,B,C,D,A
1,2,3,4,5
2,3,4,5,6

The above replacement will generate this output:

A,B,C,D,col_a
1,2,3,4,5
2,3,4,5,6

 

HTH,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Contributor

Hello @araujo  Thank you so much for your Help 
Last Question if I have my attribute like :-

INDEX
1,5,3,10
now what will be replacement value and in this case i have around 40 columns I want to rename only those which is present in my index Attribute and i want my column like 
1,B,3,D,5---10,f,-- till 40 
is there any way so that it don't depend on my all column name it's just replace the name as per the element in my INDEX attribute as it is and keep all columns without changing name??

avatar
Super Guru

@Sachin 

 

Here's another attempt at this (hopefully the last one 🙂 😞

 

I created the attached example that gets a flowfile and aattribute INDEX as you described above.

It then uses an UpdateAttribute to convert the INDEX attribute into a FILTER that we can use in the QueryRecord processor.

 

The QueryRecord process uses a fixed schema that has 100 columns. It's ok if your CSV has less columns. If the CSV can have more than 100 columns you need to update the schema to the maximum of columns you expect to receive in any CSV.

 

The output is a flowfile with the exact columns that were specified in the INDEX attribute.

 

Hope this helps.

 

Cheers,

Andre

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Super Guru

Here's the flow template for those who have older nifi versions

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Contributor

Thanks for the Help 🙂