Support Questions

Find answers, ask questions, and share your expertise

How Extract text from a multiline flow and create only one property with the all flow's content ?

avatar
Rising Star

Hi everybody

**** In input I've got a flow multiline like this one :

27/05/2016 06:28:34,000 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement fr.pe.empl.service.data.exception.SLDNotFoundException: Mini site inconnu at fr.pe.empl.service.services.impl.MiniSiteServiceImpl.lire(MiniSiteServiceImpl.java:89) at

**** Then I use "ExtractText" processor with multiline mode=true and with a new property grok=^(.*)$

And in ouput this property ${grok} has only the first line.

*** My question, how can I retrieve all the input lines in this property ?

Thanks for your answer.

1 ACCEPTED SOLUTION

avatar

Hi @Thierry Vernhet,

To achieve what you are looking for, I believe you must set the property "Enable DOTALL mode" to true.

Below is a template that produces the expected result with the example you gave.

extracttextall.xml

Hope this helps.

View solution in original post

5 REPLIES 5

avatar

Hi @Thierry Vernhet,

To achieve what you are looking for, I believe you must set the property "Enable DOTALL mode" to true.

Below is a template that produces the expected result with the example you gave.

extracttextall.xml

Hope this helps.

avatar
Rising Star

Hi @Pierre Villard

My input is as shown below

  • i, John, $100
  • ii, Kevin, $150
  • iii, Steve, $200

I used ExtractText processor with Enable Multiline Mode=true, Enable DOTALL Mode=true and new property line=(.*).

After execution I see below in provenance event in attributes tab

  • line i, John, $100 ii, Kevin, $150 iii, Steve, $200
  • line.0 i, John, $100 ii, Kevin, $150 iii, Steve, $200
  • line.1 i, John, $100 ii, Kevin, $150 iii, Steve, $200

Expected output

  • line i, John, $100
  • line.0 ii, Kevin, $150
  • line.1 iii, Steve, $200

Please suggest.

avatar
Rising Star

@Pierre Villard It's OK, Thanks a lot.

avatar
Master Mentor

Keep in mind that FlowFile Attributes live in memory. Loading a FlowFile Attribute with the entire content of the file is going to have an impact on heap usage in your flow. That being said, there are two things to consider when building dataflows like this: 1. Increasing the the size of the available heap for the NiFi application. Heap space thresholds for NiFi are configured in the bootstrap.conf file and by default are very small (512 MB).

# JVM memory settings

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

2. You must take in to consideration the data volumes you will be working with in the particular dataflow. To help prevent out of memory error in NiFi, we have established a threshold on how much data can queue on a connection before FlowFile's attributes are swapped out of heap to disk. The default configuration in the nifi.properties file is 20,000. ( nifi.queue.swap.threshold=20000 ) this is per connection not per flow. So if the FlowFiles you extracted content in begin to queue on numerous connections, you run the risk of hitting the out of memory condition quicker. You can decrease this value so swapping happens sooner, but that will in turn have an impact on performance.

I would start with increasing the heap memory for your NiFi and the go from there.

avatar
Rising Star

Hi @mclark

Thanks. Before ExtractText we use Tailfile. So every flow contains only a few records of the entire file.