Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to extract first 5 record from flow file using Nifi Processor?

avatar
New Member

Hi Team,

I have a requirement Where i have to extract first 5 records from a file(Sample.CSV, This file contain 100 rows and 5 column for each row)

Out of 5 record, each record of the 2nd column contain value as "Yes" then I want add a ATTRIBUTE for that file "Is_valid=Y" else "Is_valid=N"

Ex:

India,YES,Asia

USA,YES,USA

UK,YES,UK

India1,YES,Asia

USA1,YES,USA

I did following flow, It is working for record level.

GetFile -> Split Line -> Extract Text -> RouteOnAttribte -> UpdateAttribute

But I dont want to do this check for all the record, I need to do this check only for first 5 record and assign the Valid flag for that file.

Please help me on this.

1 ACCEPTED SOLUTION

avatar

Hi @Saminathan A

One thing you can do is drop the SplitLine processor and go straight to the ExtractText processor where you can use a regex to pull out the first 5 lines via a regex. Then you can use the groups within that regex to work on the individual groups (e.g., the first 5 lines) in the UpdateAttribute processor. This regex should work for you: ^(.*)\n(.*)\n(.*)\n(.*)\n(.*)\n.*

View solution in original post

2 REPLIES 2

avatar

Hi @Saminathan A

One thing you can do is drop the SplitLine processor and go straight to the ExtractText processor where you can use a regex to pull out the first 5 lines via a regex. Then you can use the groups within that regex to work on the individual groups (e.g., the first 5 lines) in the UpdateAttribute processor. This regex should work for you: ^(.*)\n(.*)\n(.*)\n(.*)\n(.*)\n.*

avatar
New Member

Thanks Brandon Wilson

I tried your suggestion it is working for me. Small correction in regex.

The below one is working for me (Please enable multi-line option in ExtractText configuration )

"regex: (.*)\n(.*)\n(.*)\n(.*)\n(.*) "