Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How ingest and group multiline logs files with nifi ?

avatar
Rising Star

For example I've got this 9 lines in input

24/05/2016 13:40:18,739 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'

fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement

fr.data.exception.SLDNotFoundException: Mini site inconnu

at fr.services.impl.MiniSiteServiceImpl.lire(MiniSiteServiceImpl.java:89)

at fr.services.impl.EnvoiMailSignalementDs3ServiceImpl.envoyerUnMail(EnvoiMailSignalementDs3ServiceImpl.java:60)

at fr.ressources.MailRessource.envoyerMailSignalementContenuInaproprie(MailRessource.java:41)

24/05/2016 15:40:18,739 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'

fr.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement

fr.data.exception.SLDNotFoundException: Mini site inconnu

and I'd like with an "extract text" processor to have a property with the value beginning with "24/05/2016 13:40:18,739 ERROR..." and ending just before the next timestamp "24/05/2016 15:40:18,739... " so with the first 6 input lines.

and another property beginning at the second timestamp and ending at the end of the input lines so with the last three input lines.

Is it possible to do this with Nifi ?

Thanks

1 ACCEPTED SOLUTION

avatar
Guru

You can do this by using ReplaceText to replace ^(\d{2}\/\d{2}\/\d{4}) with some delimiter not in the set (e.g. ~$1), ie. prepend a magic character to the beginning on each Real line.

You can then use SplitContent by the byte you chose to prepend with. This gives you flow files for each log entry.

However, this can be a little heavy. Make sure you're running the latest version of NiFi, and if you're working with large log files, you may need to consider increasing file handle limits.

The flow (template here: split-multi-line-example.xml) works for prepending and splitting. You can see here that 2 flowfiles have come out of the 5 line log file sample I put in.

4547-log-split.png

View solution in original post

7 REPLIES 7

avatar
Guru

You can do this by using ReplaceText to replace ^(\d{2}\/\d{2}\/\d{4}) with some delimiter not in the set (e.g. ~$1), ie. prepend a magic character to the beginning on each Real line.

You can then use SplitContent by the byte you chose to prepend with. This gives you flow files for each log entry.

However, this can be a little heavy. Make sure you're running the latest version of NiFi, and if you're working with large log files, you may need to consider increasing file handle limits.

The flow (template here: split-multi-line-example.xml) works for prepending and splitting. You can see here that 2 flowfiles have come out of the 5 line log file sample I put in.

4547-log-split.png

avatar
Rising Star

@Simon Elliston Ball

The test is KO.

Before "replace text"

25/05/2016 08:40:18,739 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement
fr.pe.empl.service.da016.recruteur.minisite.data.exception.SLDNotFoundException: Mini site inconnu
        at fr.pe.empl.service.da016.recruteur.minisite.services.impl.MiniSiteServiceImpl.lire(MiniSiteServiceImpl.java:89)
        at fr.pe.empl.service.da016.recruteur.minisite.services.impl.EnvoiMailSignalementDs3ServiceImpl.envoyerUnMail(EnvoiMailSignalementDs3ServiceImpl.java:60)
        at fr.pe.empl.service.da016.recruteur.minisite.ressources.MailRessource$Proxy$_$_WeldSubclass.envoyerMailSignalementContenuInaproprie(MailRessource$Proxy$_$_WeldSubclass.java)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
25/05/2016 08:40:18,739 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement
fr.pe.empl.service.da016.recruteur.minisite.data.exception.SLDNotFoundException: Mini site inconnu

After "replace text" and the magic character "£|£|£|"

£|£|£| 25/05/2016 08:40:18,739 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement
fr.pe.empl.service.da016.recruteur.minisite.data.exception.SLDNotFoundException: Mini site inconnu
        at fr.pe.empl.service.da016.recruteur.minisite.services.impl.MiniSiteServiceImpl.lire(MiniSiteServiceImpl.java:89)
        at fr.pe.empl.service.da016.recruteur.minisite.services.impl.EnvoiMailSignalementDs3ServiceImpl.envoyerUnMail(EnvoiMailSignalementDs3ServiceImpl.java:60)
        at fr.pe.empl.service.da016.recruteur.minisite.ressources.MailRessource$Proxy$_$_WeldSubclass.envoyerMailSignalementContenuInaproprie(MailRessource$Proxy$_$_WeldSubclass.java)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
£|£|£| 25/05/2016 08:40:18,739 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement
fr.pe.empl.service.da016.recruteur.minisite.data.exception.SLDNotFoundException: Mini site inconnu

But after "split text" Output claim has no change. It does not split... Have you got an idea ? Below split prperties :

Byte Sequence FormatInfo
Text
Byte SequenceInfo
£|£|£|
Keep Byte SequenceInfo
false
Byte Sequence LocationInfo
Trailing

Thanks

avatar
Guru

Hi @Thierry Vernhet I've added a template and screenshot of a worked example, which should make it clearer. I suspect the problem you're seeing is around the relation being used to output from the SplitContent processor. If you use the original, or worse, both outputs you will just get the original content back.

Note also that I've used the "Leading" location in my template, since the marker is inserted at the front of a line, and have also used Line-By-Line evaluation in the marker replace text for better memory usage.

avatar
Rising Star

Hi @Simon Elliston Ball

Thanks a lot for your answer. I understand now. But I cannot ignore relation ship "original" because without this relation Nifi doesn't validate my processor. How can you use "splits" relationship without the "original" one ?

Hope It's my last question for this.

avatar
Guru

The way to deal with this is to mark the original relation as auto-terminated in the SplitContent settings tab.

avatar
Rising Star

Wonderful

Now it's ok Simon

avatar
Rising Star

@Simon Elliston Ball

Thanks, I'm going to test your solution.