Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi Extraction

avatar
Rising Star

Hi guys,

i would like to ask, if this is possible in nifi alone without using any script execution.

test.txt <--- so this file of mine need to be extracted. Please see the attached file.

I just need to get the VALUE of line1 to line4 and save it to hbase.

The problem here is it has multiple request in 1 file. I need to get all the line1 to line4 per request.

PS. the count of lines per requests are different. My file is just an example of what the file looks like.

Thank you.

1 ACCEPTED SOLUTION

avatar

@regie canada

You can use the ExtractText processor and use regex within it to pull the first 4 lines. Your regex would be:

(.*)\n(.*)\n(.*)\n(.*)

15596-screen-shot-2017-05-19-at-25904-pm.png

After that you can use the SplitText processor if you want each line to be an individual flowfile or you can use the UpdateAttribute processor to make any kind of transformations on the 4 lines.

View solution in original post

9 REPLIES 9

avatar

@regie canada

You can use the ExtractText processor and use regex within it to pull the first 4 lines. Your regex would be:

(.*)\n(.*)\n(.*)\n(.*)

15596-screen-shot-2017-05-19-at-25904-pm.png

After that you can use the SplitText processor if you want each line to be an individual flowfile or you can use the UpdateAttribute processor to make any kind of transformations on the 4 lines.

avatar
Rising Star

Hi sir, thanks for the reply. I need all the first 4 lines in every ####################################################################### START of Request ####################################################################### sir. it has a multiple request in every file.

avatar
Rising Star

avatar

@regie canada

As Matt suggested below, use the SplitContent processor to split the file into multiple, smaller flow files. The "byte sequence" entry for splitting would be

####################################################################### START of Request #######################################################################

After that, use the ExtractText processor, as described in my response above, to get the first 4 lines of each flow file generated by the SplitContent processor.

avatar
Super Mentor

@regie canada

I agree. My answer was only intended to show how to split you multi-record file in to single records to be processed similar to @Eyad Garelnabi suggested approach.

Matt

avatar
Rising Star

@Matt Clarke @Eyad Garelnabi

Thank you so much!!

avatar
Super Mentor

@regie canada

Does each request in the file always start with

####################################################################### START of Request #######################################################################

and end with:

####################################################################### END of Request #######################################################################

If so, you could use the SplitContent processor to split your incoming FlowFile in multiple FlowFiles (each with a single request). Then you could parse each of those requests for the lines/values you want.

The SplitContent processor would be configured as follows in this scenario:

15676-screen-shot-2017-05-23-at-73841-am.png

Do the 4 lines you want to extract the values from have a specific format?

For example do they actually start with "Line" or is that property name always dynamic in nature?

Thanks,

Matt

avatar
Rising Star

@Matt Clarke

Thanks sir. Anyway for addition question, is there a processor that will change this Line1 : value Line2 : value Line3 : valu Line4: value

to JSON format?

Thank again.

avatar
Super Mentor
@regie canada

The extractText processor creates FlowFile attributes from the extracted text. NiFi has an AttributesToJSON processor you can use to generate JSON form these created attributes.

For new questions, please open a new question. It makes it easier for community users to search for answers.

Thanks,

Matt