- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Read text file using GetFile Processor and SplitText processor to split each line filter out the text.
Created ‎10-05-2017 01:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an Text file contain the text like
[16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] .
I want to split this line to Date : [16 Aug 2017 12:13:50,665] , Sender: [ 20f:feb:1:0:0:0:0:10e ] , Receve : [ 0.0.0.0/3333 ], and Message: [<30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] . Further i want to split the message part into some sub filed. Help me out is it possible to do with regular expression or i have to create the custom processor for this. please help me how i can do it ? i also want to save different filed to different text file. for further use of data analysis.
Created on ‎10-05-2017 03:08 PM - edited ‎08-17-2019 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Sumit Sharma,
you can use replace text processor to extract and replace text as per your requirement.
Change the search value property to:-
(.+?)\s+:INFO.*Receiver Node\s+(\[.*\])\s+(?=,).*Sender Node\s+(\[.*\])\s+(?=,).*Message\s+(\[.*\])$
Change Replacement Value property to:-
Date: $1 ,sender: $3,Receve: $2, Message: $4
ReplaceText processor Configs:-
Input :-
[16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
Output:-
Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
So this processor works dynamically according to the ff and replaces the content with your specifications.
Created ‎10-05-2017 02:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Sumit Sharma,
For the given data, Replace Text processor will do the job
by tokenizing the data with given Regular expression syntax you can replace the text.
(?s)(^\[.*\]) :(.*?):(.*?):(.*?):(.*?):(.*?): Receiver Node(.*?), Sender Node(.*?), Message(.*?)$
and the replacement text for the same can be :
Date : $1 , Sender: $7, Receve : $8, Message: $9
Hope this helps !!
Created on ‎10-05-2017 03:08 PM - edited ‎08-17-2019 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Sumit Sharma,
you can use replace text processor to extract and replace text as per your requirement.
Change the search value property to:-
(.+?)\s+:INFO.*Receiver Node\s+(\[.*\])\s+(?=,).*Sender Node\s+(\[.*\])\s+(?=,).*Message\s+(\[.*\])$
Change Replacement Value property to:-
Date: $1 ,sender: $3,Receve: $2, Message: $4
ReplaceText processor Configs:-
Input :-
[16 Aug 2017 12:13:50,665] :INFO :UDPListener : UDP Listener ::: Receiver Node [ 0.0.0.0/3333 ] , Sender Node [ 20f:feb:1:0:0:0:0:10e ] , Message [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
Output:-
Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
So this processor works dynamically according to the ff and replaces the content with your specifications.
Created ‎10-05-2017 05:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tthank you @Shu, it work but only for first line rest of the line remain same. I used the GetFile processors to read the text file location at /home/sumit/myfile/mylog.txt
this time looking for the output like.
Date : Sender: Receiver Node Message:
[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]
[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]
[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]
[16Aug201712:13:50,665] [20f:feb:1:0:0:0:0:10e] [0.0.0.0/3333] [<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-]
it read each line and produce the output. thank you
Created on ‎10-05-2017 07:41 PM - edited ‎08-17-2019 09:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think right now your flow looks like
GetFile-->SplitText(splits 1 line as separate flowfile)-->Replacetext(to prepare your content)
You need to have the below processors to get your desired result.
Final Flow:-
GetFile-->SplitText(splits 1 line as separate flowfile)-->Replacetext(to prepare your content)-->ExtractAttributes(to get contents as attributes)-->ReplaceText(to replace attribute values as content of ff)-->MergeContent(to merge the ff as one with header).
Extract text processor:-
After looking at your output you just want all the values of the content to be stored as seperate for this case first we need to extract contents of ff as attributes of ff.
by adding new properties to the processor
date as
Date:\s+(.*)\s+(?=,)
Message as
Message:\s+(.*?)$
Receve as
Receve:\s+(.*?)(,)
sender as
sender:\s+(.*?)(,)
once we extract the contents of ff as attributes then we need to use
ReplaceText Processor:-
change Replacement Value to
${date} ${receiver} ${Message} ${sender}
then change Replacement Strategy property to
Always Replace
config screenshot:-
Input:-
Date: [16 Aug 2017 12:13:50,665] ,sender: [ 20f:feb:1:0:0:0:0:10e ],Receve: [ 0.0.0.0/3333 ], Message: [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
output:-
[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]
Once we replace values then use
Merge content processor:-
To merge the flowfiles to one(depends on your requirement).
Change the below properties
Delimiter Strategy to
Text
Header to (as per your requirements and do shift+enter to insert new line)
Date : Sender: Receiver Node Message:
in my processor i kept minimum group size as 500 B , so this processor will waits until the queue size before merge content to 500 B and merges all the ff to one and gives the merged ff.
Input:-
in my case every ff is 170 B now so the processor waits for 3 ff then the queue size is 520B
[16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]
Output:-
your desired output 🙂
Date : Sender: Receiver Node Message: [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ] [16 Aug 2017 12:13:50,665] [ 0.0.0.0/3333 ] [ <30>Aug 16 12:13:50 as-pp-aa[1761]: %DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)] [ 20f:feb:1:0:0:0:0:10e ]
Configs:-
You can refer to below links to configure Merge content processor
https://community.hortonworks.com/questions/64337/apache-nifi-merge-content.html
https://stackoverflow.com/questions/34958347/mergecontent-with-nifi-inconsistent-length
Flow Screenshot:-
Created ‎10-05-2017 06:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And if i use the FetchFile Processor then how i can configure processor ? I have receive an error " Upstream Connections is invalid because Processor requires an upstream connection but currently has none"
Created ‎10-05-2017 06:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Use List File processor and configure processor as GetFile and connect success to FetchFile processor
- As ListFile processor keeps the state until what time stamp it has pulled files from that directory and only pulls the new files that got created in that directory only.
- if you want to see the state of ListFile processor right click on processor and click on view state button if you want to clear the state then click on clear state to your right on the screen.
- Keep FetchFile processor to default configurations as it gets ${absolute.path},${filename} attribute values from ListFile processor.
Flow should be:-
ListFile(sucess)---> FetchFile--->SplitText--->ReplaceText
Created ‎10-05-2017 07:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Will you please send me an link how to configure these processors. I am new in nifi.
Created ‎10-05-2017 11:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think there are no links to share but i have attached my .xml file, you can download and upload that xml change to that to your requirements.
you can refer to below link to how to import xml file into your nifi canvas
Created ‎10-11-2017 07:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your template .. it help me but i use the putFile processor to save the record , it replace the previous record every time. i don't want to replace the text.
when it match the regular expiration it append the text.
input is
- [16Aug201712:13:50,665]:INFO :UDPListener: UDP Listener:::ReceiverNode[0.0.0.0/3333],SenderNode[20f:feb:1:0:0:0:0:10e],Message[<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
- [16Aug201712:13:50,665]:INFO :UDPListener: UDP Listener:::ReceiverNode[0.0.0.0/3333],SenderNode[20f:feb:1:0:0:0:0:10e],Message[<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
- [16Aug201712:13:50,665]:INFO :UDPListener: UDP Listener:::ReceiverNode[0.0.0.0/3333],SenderNode[20f:feb:1:0:0:0:0:10e],Message[<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)]
Output is :
<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)
<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)
<30>Aug1612:13:50as-pp-aa[1761]:%DAEMON-6-SNMP_TRAP_LINK_UP: ifIndex 669, ifAdminStatus up(1)
