Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Read text file using GetFile Processor and SplitText processor to split each line filter out the text.

avatar
Explorer

I have an Text file contain the text like

[16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] .

I want to split this line to Date : [16 Aug 2017 12:13:50,665] , Sender: [ 20f:feb:1:0:0:0:0:10e ] , Receve : [ 0.0.0.0/3333 ], and Message: [<30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] . Further i want to split the message part into some sub filed. Help me out is it possible to do with regular expression or i have to create the custom processor for this. please help me how i can do it ? i also want to save different filed to different text file. for further use of data analysis.

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Sumit Sharma,

you can use replace text processor to extract and replace text as per your requirement.

Change the search value property to:-

(.+?)\s+:INFO.*Receiver Node\s+(\[.*\])\s+(?=,).*Sender Node\s+(\[.*\])\s+(?=,).*Message\s+(\[.*\])$

Change Replacement Value property to:-

Date: $1 ,sender: $3,Receve: $2, Message: $4

ReplaceText processor Configs:-

40698-replace-text.png

Input :-

[16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]

Output:-

Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]

So this processor works dynamically according to the ff and replaces the content with your specifications.

View solution in original post

10 REPLIES 10

avatar
Super Collaborator

Hi @Sumit Sharma,

For the given data, Replace Text processor will do the job

by tokenizing the data with given Regular expression syntax you can replace the text.

(?s)(^\[.*\]) :(.*?):(.*?):(.*?):(.*?):(.*?): Receiver Node(.*?), Sender Node(.*?), Message(.*?)$

and the replacement text for the same can be :

Date : $1 , Sender: $7, Receve : $8, Message: $9

nifi-replacetext.png

nifi-output-data.png

Hope this helps !!

avatar
Master Guru

Hi @Sumit Sharma,

you can use replace text processor to extract and replace text as per your requirement.

Change the search value property to:-

(.+?)\s+:INFO.*Receiver Node\s+(\[.*\])\s+(?=,).*Sender Node\s+(\[.*\])\s+(?=,).*Message\s+(\[.*\])$

Change Replacement Value property to:-

Date: $1 ,sender: $3,Receve: $2, Message: $4

ReplaceText processor Configs:-

40698-replace-text.png

Input :-

[16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]

Output:-

Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]

So this processor works dynamically according to the ff and replaces the content with your specifications.

avatar
Explorer

Tthank you @Shu, it work but only for first line rest of the line remain same. I used the GetFile processors to read the text file location at /home/sumit/myfile/mylog.txt

this time looking for the output like.

Date : Sender: Receiver Node Message:

[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]

[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]

[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]

[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]

it read each line and produce the output. thank you

avatar
Master Guru
@Sumit Sharma

I think right now your flow looks like

GetFile-->SplitText(splits 1 line as separate flowfile)-->Replacetext(to prepare your content)

You need to have the below processors to get your desired result.

Final Flow:-

GetFile-->SplitText(splits 1 line as separate flowfile)-->Replacetext(to prepare your content)-->ExtractAttributes(to get contents as attributes)-->ReplaceText(to replace attribute values as content of ff)-->MergeContent(to merge the ff as one with header).

Extract text processor:-

After looking at your output you just want all the values of the content to be stored as seperate for this case first we need to extract contents of ff as attributes of ff.

by adding new properties to the processor

date as

Date:\s+(.*)\s+(?=,)

Message as

Message:\s+(.*?)$

Receve as

Receve:\s+(.*?)(,)

sender as

sender:\s+(.*?)(,)

40699-extract-text.png

once we extract the contents of ff as attributes then we need to use

ReplaceText Processor:-

change Replacement Value to

${date} ${receiver} ${Message} ${sender}

then change Replacement Strategy property to

Always Replace

config screenshot:-

40702-replacetext.png

Input:-

Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]

output:-

[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]

Once we replace values then use

Merge content processor:-

To merge the flowfiles to one(depends on your requirement).

Change the below properties

Delimiter Strategy to

Text

Header to (as per your requirements and do shift+enter to insert new line)

Date : Sender: Receiver Node Message:

in my processor i kept minimum group size as 500 B , so this processor will waits until the queue size before merge content to 500 B and merges all the ff to one and gives the merged ff.

Input:-

in my case every ff is 170 B now so the processor waits for 3 ff then the queue size is 520B

[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]

Output:-

your desired output 🙂

Date : Sender: Receiver Node Message:
[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]
[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]
[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]

Configs:-

40700-merge-content.png

You can refer to below links to configure Merge content processor

https://community.hortonworks.com/questions/64337/apache-nifi-merge-content.html

https://community.hortonworks.com/questions/88199/issue-with-nifi-merge-content-files-stay-in-the-qu...

https://stackoverflow.com/questions/34958347/mergecontent-with-nifi-inconsistent-length

Flow Screenshot:-

40701-flow-merge-file.png

avatar
Explorer

And if i use the FetchFile Processor then how i can configure processor ? I have receive an error " Upstream Connections is invalid because Processor requires an upstream connection but currently has none"

avatar
Master Guru

@Sumit Sharma,

  1. Use List File processor and configure processor as GetFile and connect success to FetchFile processor
  2. As ListFile processor keeps the state until what time stamp it has pulled files from that directory and only pulls the new files that got created in that directory only.
  3. if you want to see the state of ListFile processor right click on processor and click on view state button if you want to clear the state then click on clear state to your right on the screen.
  4. Keep FetchFile processor to default configurations as it gets ${absolute.path},${filename} attribute values from ListFile processor.

Flow should be:-

ListFile(sucess)---> FetchFile--->SplitText--->ReplaceText

avatar
Explorer

Will you please send me an link how to configure these processors. I am new in nifi.

avatar
Master Guru
@Sumit Sharma

I think there are no links to share but i have attached my .xml file, you can download and upload that xml change to that to your requirements.

flow-extract-mergexml.xml

you can refer to below link to how to import xml file into your nifi canvas

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_user-guide/content/Import_Template.htm...

avatar
Explorer

Thank you for your template .. it help me but i use the putFile processor to save the record , it replace the previous record every time. i don't want to replace the text.

when it match the regular expiration it append the text.

input is

  1. [16Aug201712:13:50,665]:INFO :UDPListener: UDP Listener:::ReceiverNode[0.0.0.0/3333],SenderNode[20f:feb:1:0:0:0:0:10e],Message[<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
  2. [16Aug201712:13:50,665]:INFO :UDPListener: UDP Listener:::ReceiverNode[0.0.0.0/3333],SenderNode[20f:feb:1:0:0:0:0:10e],Message[<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
  3. [16Aug201712:13:50,665]:INFO :UDPListener: UDP Listener:::ReceiverNode[0.0.0.0/3333],SenderNode[20f:feb:1:0:0:0:0:10e],Message[<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]

Output is :

<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)

<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)

<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)