I got a large csv file where fields are seperated by tabulator and have approx 150 columns
I need to filter this file and split it into two routes based on a column value
status = 6 or status = 7
I have the data schema so i know each fields. But what would be the best approach to do this.
I was thinking on
1.convert it into avro with InferAvroSchema -> ConvertCSVToAvro add status value to attribute and route it with RouteOnAttribute, but i cannot find a processor to parse avro fields to attributes.
2. convert it into json and parse values to attribute, but this seems to be a complex matter to configure
Im am running NiFi - Version 184.108.40.206.1.2.0-10
Is there a best practice for handling this.
Hi @Simon Jespersen, you can use extract text processor and get status attribute to ff and then convert csvtoavro then use routeonattribute processor to split the data into 2 routes.
we need to extract status value as attribute for this purpose we need to split our file into each record in to seperate flowfile.
so that the input to ExtractText processor would be one record as ff.
Connect the splits relation to ExtactText processor.
id=10,age=10,status=6,salary=90000 id=11,age=11,status=7,salary=100000 id=12,age=12,status=8,salary=110000
We mentioned in processor configs LineSplitCount as 1 output of splittext splits the file into individual records as one record per flowfile.
Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes.
We need to extract status value from ff content as attribute of ff by using Regex.
add a new property to extract status value as attribute to the flowfile
In this processor we have extracted the status value as attribute to every flowfile.
Then use InferAvroSchema processor and ConvertAvrotoJSON and then use
to split the flowfiles by adding below properties.
Then make use of these two properties to connect to another processors
Once make sure in your flow that you have connected only the exact same relations that are in screenshot to the next processors.
Hope this will helps ...!!!