Created 09-22-2017 06:00 AM
Hi,
I got a large csv file where fields are seperated by tabulator and have approx 150 columns
I need to filter this file and split it into two routes based on a column value
status = 6 or status = 7
I have the data schema so i know each fields. But what would be the best approach to do this.
I was thinking on
1.convert it into avro with InferAvroSchema -> ConvertCSVToAvro add status value to attribute and route it with RouteOnAttribute, but i cannot find a processor to parse avro fields to attributes.
2. convert it into json and parse values to attribute, but this seems to be a complex matter to configure
Im am running NiFi - Version 1.1.0.2.1.2.0-10
Is there a best practice for handling this.
Created on 09-30-2017 08:51 PM - edited 08-17-2019 10:56 PM
Hi @Simon Jespersen, you can use extract text processor and get status attribute to ff and then convert csvtoavro then use routeonattribute processor to split the data into 2 routes.
GetFile----->SplitText----->ExtractText---->InferAvroSchema----->ConvertCSVToAvro----->RouteOnAttribute---->(status=6),(status=7)
SplitText Processor:-
we need to extract status value as attribute for this purpose we need to split our file into each record in to seperate flowfile.
so that the input to ExtractText processor would be one record as ff.
Connect the splits relation to ExtactText processor.
Input:-
id=10,age=10,status=6,salary=90000 id=11,age=11,status=7,salary=100000 id=12,age=12,status=8,salary=110000
output:-
We mentioned in processor configs LineSplitCount as 1 output of splittext splits the file into individual records as one record per flowfile.
ff1:-
id=10,age=10,status=6,salary=90000
ff2:-
id=11,age=11,status=7,salary=100000
ff3:-
id=12,age=12,status=8,salary=110000
ExtractText Processor:-
Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes.
We need to extract status value from ff content as attribute of ff by using Regex.
Config:-
Regex Property
add a new property to extract status value as attribute to the flowfile
status=(\d*)
In this processor we have extracted the status value as attribute to every flowfile.
Then use InferAvroSchema processor and ConvertAvrotoJSON and then use
RouteOnAttribute
to split the flowfiles by adding below properties.
status=6
${status:equals("6")}
status=7
${status:equals("7")}
Then make use of these two properties to connect to another processors
Flow:-
Once make sure in your flow that you have connected only the exact same relations that are in screenshot to the next processors.
Hope this will helps ...!!!
Created 10-06-2020 08:47 AM
Do you have this flow available for download? Would be an excellent template