Created 03-01-2022 01:53 AM
I have the Flow File which has Duplicate Column i want to pick the Column threw index Number,
is it possible to do with Query Record or any Processor
Note: Column Will change with every new Flow File coming
Created 03-02-2022 10:30 PM
Here's the flow template for those who have older nifi versions
Created on 03-01-2022 02:58 AM - edited 03-01-2022 03:00 AM
Hi, @sachin_32 ,
I guess this is coming as a CSV file, right?
You can achieve what you want with the following approach:
{
"type": "record",
"name": "SensorReading",
"namespace": "com.cloudera.example",
"doc": "This is a sample sensor reading",
"fields": [
{ "name": "c1", "type": "string" },
{ "name": "c2", "type": "string" },
{ "name": "c3", "type": "string" }
]
}
Ensure you use a schema with the exact number of columns that your input file has.
select c1, c2, c3
from flowfile
Cheers,
André
Created 03-01-2022 03:15 AM
Thanks for your Suggestion but in this case i don't have any Exact Number of columns it will keep changing with incoming flow file it completely depends on the Flowfile And the scenario is i have few columns which can directly pick by giving the name of column but for some column which is coming more than one for that i need to setup like indexing and it's around 10-15 files which has this kind of issues so can you suggest for that ?
Created 03-01-2022 03:32 AM
The number of columns in the schema doesn't actually need to be exact if you're happy to ignore the ones after the last one specified in the schema.
Created on 03-01-2022 03:55 AM - edited 03-01-2022 03:56 AM
ok
Created 03-02-2022 03:21 AM
Here one different attempt. You can send your CSV flowfile to a ReplaceText processor with the following configuration:
The Search Value is the following regular expression:
(?s)^([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*)(.*$)
And the Replacement Value is:
$1,$2,$3,$4,col_a$6
Each capture group ([^,\n]*) will match the name of one column. If you want to keep the name of that column you just replace it with $x, where x is the position of the column.
If you want to replace the column with another name, e.g. col_a, you just type the name of the new column name in the replacement instead.
The last capture group (.*), will match the remaining of the first line. This way you don't need to match every single column, only the ones up to the position you want to replace.
As an example, for this input:
A,B,C,D,A
1,2,3,4,5
2,3,4,5,6
The above replacement will generate this output:
A,B,C,D,col_a
1,2,3,4,5
2,3,4,5,6
HTH,
André
Created on 03-02-2022 10:41 AM - edited 03-02-2022 10:43 AM
Hello @araujo Thank you so much for your Help
Last Question if I have my attribute like :-
Created 03-02-2022 04:36 PM
Here's another attempt at this (hopefully the last one 🙂 😞
I created the attached example that gets a flowfile and aattribute INDEX as you described above.
It then uses an UpdateAttribute to convert the INDEX attribute into a FILTER that we can use in the QueryRecord processor.
The QueryRecord process uses a fixed schema that has 100 columns. It's ok if your CSV has less columns. If the CSV can have more than 100 columns you need to update the schema to the maximum of columns you expect to receive in any CSV.
The output is a flowfile with the exact columns that were specified in the INDEX attribute.
Hope this helps.
Cheers,
Andre
Created 03-02-2022 10:30 PM
Created 03-02-2022 11:02 PM
Thanks for the Help 🙂