Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

NIFIroute csv flowfile based on contents

Rising Star

Hi,

I got a large csv file where fields are seperated by tabulator and have approx 150 columns

I need to filter this file and split it into two routes based on a column value

status = 6 or status = 7

I have the data schema so i know each fields. But what would be the best approach to do this.

I was thinking on

1.convert it into avro with InferAvroSchema -> ConvertCSVToAvro add status value to attribute and route it with RouteOnAttribute, but i cannot find a processor to parse avro fields to attributes.

2. convert it into json and parse values to attribute, but this seems to be a complex matter to configure

Im am running NiFi - Version 1.1.0.2.1.2.0-10

Is there a best practice for handling this.

2 REPLIES 2

Super Guru

Hi @Simon Jespersen, you can use extract text processor and get status attribute to ff and then convert csvtoavro then use routeonattribute processor to split the data into 2 routes.

GetFile----->SplitText----->ExtractText---->InferAvroSchema----->ConvertCSVToAvro----->RouteOnAttribute---->(status=6),(status=7)

SplitText Processor:-

we need to extract status value as attribute for this purpose we need to split our file into each record in to seperate flowfile.

so that the input to ExtractText processor would be one record as ff.

40608-splittext.png

Connect the splits relation to ExtactText processor.

Input:-

id=10,age=10,status=6,salary=90000
id=11,age=11,status=7,salary=100000
id=12,age=12,status=8,salary=110000

output:-

We mentioned in processor configs LineSplitCount as 1 output of splittext splits the file into individual records as one record per flowfile.

ff1:-

id=10,age=10,status=6,salary=90000

ff2:-

id=11,age=11,status=7,salary=100000

ff3:-

id=12,age=12,status=8,salary=110000

ExtractText Processor:-

Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes.

We need to extract status value from ff content as attribute of ff by using Regex.

Config:-

40609-extract-text.png

Regex Property

add a new property to extract status value as attribute to the flowfile

status=(\d*)

In this processor we have extracted the status value as attribute to every flowfile.

Then use InferAvroSchema processor and ConvertAvrotoJSON and then use

RouteOnAttribute

to split the flowfiles by adding below properties.

status=6

${status:equals("6")}

status=7

${status:equals("7")}

40610-roa.png

Then make use of these two properties to connect to another processors

Flow:-

40611-flow.png

Once make sure in your flow that you have connected only the exact same relations that are in screenshot to the next processors.

Hope this will helps ...!!!

New Contributor

Do you have this flow available for download? Would be an excellent template