- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How Extract text from a multiline flow and create only one property with the all flow's content ?
- Labels:
-
Apache NiFi
Created ‎05-27-2016 06:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everybody
**** In input I've got a flow multiline like this one :
27/05/2016 06:28:34,000 ERROR [ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' fr.pe.sldng.integration.rest.GestionnaireExceptionSollicitationRest Exception lors du traitement fr.pe.empl.service.data.exception.SLDNotFoundException: Mini site inconnu at fr.pe.empl.service.services.impl.MiniSiteServiceImpl.lire(MiniSiteServiceImpl.java:89) at
**** Then I use "ExtractText" processor with multiline mode=true and with a new property grok=^(.*)$
And in ouput this property ${grok} has only the first line.
*** My question, how can I retrieve all the input lines in this property ?
Thanks for your answer.
Created ‎05-27-2016 08:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Thierry Vernhet,
To achieve what you are looking for, I believe you must set the property "Enable DOTALL mode" to true.
Below is a template that produces the expected result with the example you gave.
Hope this helps.
Created ‎05-27-2016 08:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Thierry Vernhet,
To achieve what you are looking for, I believe you must set the property "Enable DOTALL mode" to true.
Below is a template that produces the expected result with the example you gave.
Hope this helps.
Created ‎10-26-2016 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My input is as shown below
- i, John, $100
- ii, Kevin, $150
- iii, Steve, $200
I used ExtractText processor with Enable Multiline Mode=true, Enable DOTALL Mode=true and new property line=(.*).
After execution I see below in provenance event in attributes tab
- line i, John, $100 ii, Kevin, $150 iii, Steve, $200
- line.0 i, John, $100 ii, Kevin, $150 iii, Steve, $200
- line.1 i, John, $100 ii, Kevin, $150 iii, Steve, $200
Expected output
- line i, John, $100
- line.0 ii, Kevin, $150
- line.1 iii, Steve, $200
Please suggest.
Created ‎05-27-2016 08:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Pierre Villard It's OK, Thanks a lot.
Created ‎05-27-2016 12:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Keep in mind that FlowFile Attributes live in memory. Loading a FlowFile Attribute with the entire content of the file is going to have an impact on heap usage in your flow. That being said, there are two things to consider when building dataflows like this: 1. Increasing the the size of the available heap for the NiFi application. Heap space thresholds for NiFi are configured in the bootstrap.conf file and by default are very small (512 MB).
# JVM memory settings
java.arg.2=-Xms512m
java.arg.3=-Xmx512m
2. You must take in to consideration the data volumes you will be working with in the particular dataflow. To help prevent out of memory error in NiFi, we have established a threshold on how much data can queue on a connection before FlowFile's attributes are swapped out of heap to disk. The default configuration in the nifi.properties file is 20,000. ( nifi.queue.swap.threshold=20000 ) this is per connection not per flow. So if the FlowFiles you extracted content in begin to queue on numerous connections, you run the risk of hitting the out of memory condition quicker. You can decrease this value so swapping happens sooner, but that will in turn have an impact on performance.
I would start with increasing the heap memory for your NiFi and the go from there.
Created ‎05-30-2016 07:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @mclark
Thanks. Before ExtractText we use Tailfile. So every flow contains only a few records of the entire file.
