Member since
07-30-2019
3406
Posts
1622
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 185 | 12-17-2025 05:55 AM | |
| 246 | 12-15-2025 01:29 PM | |
| 183 | 12-15-2025 06:50 AM | |
| 277 | 12-05-2025 08:25 AM | |
| 460 | 12-03-2025 10:21 AM |
01-13-2021
11:42 AM
1 Kudo
@Fierymech I am not able to reproduce the 1file in and 7 files out that you described. Are you sure it is only 1 FlowFile in? As far as your use case goes, perhaps you use RouteText first to produce the new FlowFile with all the lines containing the data you want (This includes the leading "*" and trailing "yes yes ..." strings). To do this you would use the following regex and a "matching strategy" of "contains regular expression": \d{1,2}\s\w{1,4}\-\w{1,4}\-(\w{1,2}|\w{0})\s\d{1,5}\s\d{1,2}\-\w{1,3}\s\d{1,2}\:\d{1,2}\:\d{1,2} Then pass that new FlowFile to a ReplaceText processor which can trim off the leading whitespaces and "*" and any trailing whitespace characters and additional text. This ReplaceText would be configured as follows: And use Search Value of which contains 3 capture groups: (.*?)(\d{1,2}\s\w{1,4}\-\w{1,4}\-(\w{1,2}|\w{0})\s\d{1,5}\s\d{1,2}\-\w{1,3}\s\d{1,2}\:\d{1,2}\:\d{1,2})(.*?)$ The processor replaces text line-by-line with only the second capture group. This worked for me to get the end result of: Which is what i believe you are looking for in the resulting FlowFile's content. Hope this helps, Matt
... View more
01-13-2021
11:03 AM
1 Kudo
@Fierymech Here is the raw XML for the template: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.3">
<description></description>
<groupId>658f3a7f-0171-1000-0000-00007706d23d</groupId>
<name>RouteText-Example</name>
<snippet>
<connections>
<id>44a4a771-a46f-3b74-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<bends>
<x>0.0</x>
<y>408.0</y>
</bends>
<destination>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>e81dcc20-cb6f-3466-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression>
<loadBalancePartitionAttribute></loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy>
<name></name>
<selectedRelationships>original</selectedRelationships>
<source>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>696c796c-aa86-3e71-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<connections>
<id>7bcc4c55-39cf-3b4c-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<destination>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>696c796c-aa86-3e71-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression>
<loadBalancePartitionAttribute></loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy>
<name></name>
<selectedRelationships>success</selectedRelationships>
<source>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>7386cd31-1cfd-3e04-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<connections>
<id>dde64489-3d09-3e4f-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<bends>
<x>480.0</x>
<y>408.0</y>
</bends>
<destination>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>e81dcc20-cb6f-3466-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression>
<loadBalancePartitionAttribute></loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy>
<name></name>
<selectedRelationships>unmatched</selectedRelationships>
<source>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>696c796c-aa86-3e71-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<connections>
<id>f2b4b8ba-09e7-3125-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<bends>
<x>240.0</x>
<y>408.0</y>
</bends>
<destination>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>e81dcc20-cb6f-3466-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression>
<loadBalancePartitionAttribute></loadBalancePartitionAttribute>
<loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus>
<loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy>
<name></name>
<selectedRelationships>matched</selectedRelationships>
<source>
<groupId>2edcec51-fc4e-38cf-0000-000000000000</groupId>
<id>696c796c-aa86-3e71-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<processors>
<id>19e6484f-37ca-3639-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<position>
<x>736.0</x>
<y>0.0</y>
</position>
<bundle>
<artifact>nifi-standard-nar</artifact>
<group>org.apache.nifi</group>
<version>1.12.1.3.5.2.0-99</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Routing Strategy</key>
<value>
<name>Routing Strategy</name>
</value>
</entry>
<entry>
<key>Matching Strategy</key>
<value>
<name>Matching Strategy</name>
</value>
</entry>
<entry>
<key>Character Set</key>
<value>
<name>Character Set</name>
</value>
</entry>
<entry>
<key>Ignore Leading/Trailing Whitespace</key>
<value>
<name>Ignore Leading/Trailing Whitespace</name>
</value>
</entry>
<entry>
<key>Ignore Case</key>
<value>
<name>Ignore Case</name>
</value>
</entry>
<entry>
<key>Grouping Regular Expression</key>
<value>
<name>Grouping Regular Expression</name>
</value>
</entry>
<entry>
<key>lines</key>
<value>
<name>lines</name>
</value>
</entry>
<entry>
<key>matched</key>
<value>
<name>matched</name>
</value>
</entry>
</descriptors>
<executionNode>ALL</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>Routing Strategy</key>
<value>Route to each matching Property Name</value>
</entry>
<entry>
<key>Matching Strategy</key>
<value>Contains Regular Expression</value>
</entry>
<entry>
<key>Character Set</key>
<value>UTF-8</value>
</entry>
<entry>
<key>Ignore Leading/Trailing Whitespace</key>
<value>true</value>
</entry>
<entry>
<key>Ignore Case</key>
<value>false</value>
</entry>
<entry>
<key>Grouping Regular Expression</key>
</entry>
<entry>
<key>lines</key>
<value>\d{0,2}\sabc-\w{0,2}\d{0,2}-\d{0,2}\w{0,2}\s\d{0,6}\s\d{0,2}-\w{0,3}\s\d{0,2}\:\d{0,2}\:\d{0,2}</value>
</entry>
<entry>
<key>matched</key>
<value>(\s|[ \t]|\*)\d{1,2}\s\w{1,4}\-\w{1,4}\-(\w{1,2}|\w{0})\s\d{1,5}\s\d{1,2}\-\w{1,3}\s\d{1,2}\:\d{1,2}\:\d{1,2}</value>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>0 sec</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<executionNodeRestricted>false</executionNodeRestricted>
<name>RouteText</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>lines</name>
</relationships>
<relationships>
<autoTerminate>false</autoTerminate>
<name>matched</name>
</relationships>
<relationships>
<autoTerminate>false</autoTerminate>
<name>original</name>
</relationships>
<relationships>
<autoTerminate>false</autoTerminate>
<name>unmatched</name>
</relationships>
<state>STOPPED</state>
<style/>
<type>org.apache.nifi.processors.standard.RouteText</type>
</processors>
<processors>
<id>696c796c-aa86-3e71-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<position>
<x>64.0</x>
<y>248.0</y>
</position>
<bundle>
<artifact>nifi-standard-nar</artifact>
<group>org.apache.nifi</group>
<version>1.12.1.3.5.2.0-99</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Routing Strategy</key>
<value>
<name>Routing Strategy</name>
</value>
</entry>
<entry>
<key>Matching Strategy</key>
<value>
<name>Matching Strategy</name>
</value>
</entry>
<entry>
<key>Character Set</key>
<value>
<name>Character Set</name>
</value>
</entry>
<entry>
<key>Ignore Leading/Trailing Whitespace</key>
<value>
<name>Ignore Leading/Trailing Whitespace</name>
</value>
</entry>
<entry>
<key>Ignore Case</key>
<value>
<name>Ignore Case</name>
</value>
</entry>
<entry>
<key>Grouping Regular Expression</key>
<value>
<name>Grouping Regular Expression</name>
</value>
</entry>
<entry>
<key>matched</key>
<value>
<name>matched</name>
</value>
</entry>
</descriptors>
<executionNode>ALL</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>Routing Strategy</key>
<value>Route to each matching Property Name</value>
</entry>
<entry>
<key>Matching Strategy</key>
<value>Matches Regular Expression</value>
</entry>
<entry>
<key>Character Set</key>
<value>UTF-8</value>
</entry>
<entry>
<key>Ignore Leading/Trailing Whitespace</key>
<value>true</value>
</entry>
<entry>
<key>Ignore Case</key>
<value>false</value>
</entry>
<entry>
<key>Grouping Regular Expression</key>
</entry>
<entry>
<key>matched</key>
<value>(\s|[ \t]|\*)\d{1,2}\s\w{1,4}\-\w{1,4}\-(\w{1,2}|\w{0})\s\d{1,5}\s\d{1,2}\-\w{1,3}\s\d{1,2}\:\d{1,2}\:\d{1,2}</value>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>0 sec</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<executionNodeRestricted>false</executionNodeRestricted>
<name>RouteText</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>matched</name>
</relationships>
<relationships>
<autoTerminate>false</autoTerminate>
<name>original</name>
</relationships>
<relationships>
<autoTerminate>false</autoTerminate>
<name>unmatched</name>
</relationships>
<state>STOPPED</state>
<style/>
<type>org.apache.nifi.processors.standard.RouteText</type>
</processors>
<processors>
<id>7386cd31-1cfd-3e04-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<position>
<x>56.0</x>
<y>40.0</y>
</position>
<bundle>
<artifact>nifi-standard-nar</artifact>
<group>org.apache.nifi</group>
<version>1.12.1.3.5.2.0-99</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>File Size</key>
<value>
<name>File Size</name>
</value>
</entry>
<entry>
<key>Batch Size</key>
<value>
<name>Batch Size</name>
</value>
</entry>
<entry>
<key>Data Format</key>
<value>
<name>Data Format</name>
</value>
</entry>
<entry>
<key>Unique FlowFiles</key>
<value>
<name>Unique FlowFiles</name>
</value>
</entry>
<entry>
<key>generate-ff-custom-text</key>
<value>
<name>generate-ff-custom-text</name>
</value>
</entry>
<entry>
<key>character-set</key>
<value>
<name>character-set</name>
</value>
</entry>
<entry>
<key>mime-type</key>
<value>
<name>mime-type</name>
</value>
</entry>
</descriptors>
<executionNode>PRIMARY</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>File Size</key>
<value>0B</value>
</entry>
<entry>
<key>Batch Size</key>
<value>1</value>
</entry>
<entry>
<key>Data Format</key>
<value>Text</value>
</entry>
<entry>
<key>Unique FlowFiles</key>
<value>false</value>
</entry>
<entry>
<key>generate-ff-custom-text</key>
<value> Capable Qd Tx
# Name S/N Since Mas/Sla bytes
-- --------- ----- --------------- --- --- ---------
0 hub-lr1-0 35189 20-Dec 03:43:54
1 lr2-27-27 35209 20-Dec 03:43:54
2 dt27-kcd- 35185 20-Dec 03:43:54
* 3 rr1-2627- 34748 20-Dec 03:43:54 yes yes 0
4 hub-rr2-g 34609 20-Dec 03:43:54
5 hub-lr2-0 34686 20-Dec 03:43:54
6 hub-lr1-0 34631 20-Dec 03:43:54
7 hub-rr3-g 34692 20-Dec 03:43:54
8 hub-rr3-g 34568 20-Dec 03:43:54
9 hub-rr2-g 35203 20-Dec 03:43:54
10 hub-rr2-g 35200 20-Dec 03:43:54
11 hub-lr1-0 35205 20-Dec 03:43:54
12 hub-rr1-0 34394 20-Dec 03:43:54
13 hub-rr3-g 35191 20-Dec 03:43:54
14 hub-lr2-0 35196 20-Dec 03:43:54
15 hub-lr1-0 35214 20-Dec 03:43:54
16 hub-rr1-0 34577 20-Dec 03:43:54
*17 hub-rr3-g 35217 20-Dec 03:43:56
Logs for Radio IP xx.xx.xx.xx
telnet> Trying xx.xx.xx.xx...
Logs for Radio IP xx.xx.xx.xx
telnet> Trying xx.xx.xx.xx...
Connected to xx.xx.xx.xx.
Escape character is '^]'.
Capable Qd Tx
# Name S/N Since Mas/Sla bytes
-- --------- ----- --------------- --- --- ---------
0 hub-lr1-0 35189 20-Dec 03:43:54
1 hub-rr2-g 35209 20-Dec 03:43:54
2 hub-rr1-0 35185 20-Dec 03:43:54
3 hub-rr1-0 34748 20-Dec 03:43:54
4 hub-rr2-g 34609 20-Dec 03:43:54
5 hub-lr2-0 34686 20-Dec 03:43:54
6 hub-lr1-0 34631 20-Dec 03:43:54
7 hub-rr3-g 34692 20-Dec 03:43:54
8 hub-rr3-g 34568 20-Dec 03:43:54
9 hub-rr2-g 35203 20-Dec 03:43:54
10 hub-rr2-g 35200 20-Dec 03:43:54
11 hub-lr1-0 35205 20-Dec 03:43:54
12 hub-rr1-0 34394 20-Dec 03:43:54
13 hub-rr3-g 35191 20-Dec 03:43:54
14 hub-lr2-0 35196 20-Dec 03:43:54
15 hub-lr1-0 35214 20-Dec 03:43:54
16 hub-rr1-0 34577 20-Dec 03:43:54
*17 hub-rr3-g 35217 20-Dec 03:43:54 yes yes 0 </value>
</entry>
<entry>
<key>character-set</key>
<value>UTF-8</value>
</entry>
<entry>
<key>mime-type</key>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>60 sec</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<executionNodeRestricted>false</executionNodeRestricted>
<name>GenerateFlowFile</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>success</name>
</relationships>
<state>STOPPED</state>
<style/>
<type>org.apache.nifi.processors.standard.GenerateFlowFile</type>
</processors>
<processors>
<id>e81dcc20-cb6f-3466-0000-000000000000</id>
<parentGroupId>2edcec51-fc4e-38cf-0000-000000000000</parentGroupId>
<position>
<x>64.0</x>
<y>440.0</y>
</position>
<bundle>
<artifact>nifi-update-attribute-nar</artifact>
<group>org.apache.nifi</group>
<version>1.12.1.3.5.2.0-99</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Delete Attributes Expression</key>
<value>
<name>Delete Attributes Expression</name>
</value>
</entry>
<entry>
<key>Store State</key>
<value>
<name>Store State</name>
</value>
</entry>
<entry>
<key>Stateful Variables Initial Value</key>
<value>
<name>Stateful Variables Initial Value</name>
</value>
</entry>
<entry>
<key>canonical-value-lookup-cache-size</key>
<value>
<name>canonical-value-lookup-cache-size</name>
</value>
</entry>
</descriptors>
<executionNode>ALL</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>Delete Attributes Expression</key>
</entry>
<entry>
<key>Store State</key>
<value>Do not store state</value>
</entry>
<entry>
<key>Stateful Variables Initial Value</key>
</entry>
<entry>
<key>canonical-value-lookup-cache-size</key>
<value>100</value>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>0 sec</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<executionNodeRestricted>false</executionNodeRestricted>
<name>UpdateAttribute</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>success</name>
</relationships>
<state>STOPPED</state>
<style/>
<type>org.apache.nifi.processors.attributes.UpdateAttribute</type>
</processors>
</snippet>
<timestamp>01/13/2021 16:41:18 UTC</timestamp>
</template> Save this entire xml snippet to a file with the .xml extension and import it as a template in your NiFi. Hope this helps, Matt
... View more
01-13-2021
08:46 AM
Blocks my attachment (sorry)
... View more
01-13-2021
08:45 AM
1 Kudo
@Fierymech RouteText does not modify the content of the lines. It only routes lines to different produced new FlowFiles. The content of those lines remains unchanged. The RouteText processor also does nothing with capture groups, so the entire regex is going to be evaluated against each line. I took the entire content from your "https://regex101.com/r/pdo6Ca/1" and the entire new regex from same and ran this flow. I produced only one source FlowFile as you can see processor in = 1 You can see it routed "out" three FlowFiles. One to the connection with the "matched" relationship which contains only one line since only one line matched the entire regex. (regex101 is taking in to account your capture groups) If you change: And run same test again, you will see a few more lines match (those with the additional "yes yes ..." in the lines). I attached template i used which you can import to your NiFi. Community does not support .xml files so i changed extension to .txt. You will need to change extension back to .xml before you can import the template in to your NiFi. Hope this helps.
... View more
01-13-2021
06:04 AM
1 Kudo
@Fierymech It may be helpful if you shared your RouteText processor configuration. Correct me if I am wrong, but you are looking to have all lines (minus the header lines) placed in a new FlowFile by themselves. Using you example data and the regex you provided. wwwwww aa cc
# Name foo Since ddd/www dddd
-- --------- ----- --------------- --- --- ---------
0 abc-lr1-0 35189 20-Dec 03:43:54
1 abc-rr2-g 35209 20-Dec 03:43:54
* 2 abc-rr1-0 35185 20-Dec 03:43:54
* 15 abc-lr2-0 34686 20-Dec 03:43:54
16 abc-lr1-0 34631 20-Dec 03:43:54 The above would result in a FlowFile with only lines 0,1, and 16. The header plus lines 2 and 15 would route to unmatched because of the leading "*" which does not match your regex. Result would be a FlowFile with: 0 abc-lr1-0 35189 20-Dec 03:43:54
1 abc-rr2-g 35209 20-Dec 03:43:54
16 abc-lr1-0 34631 20-Dec 03:43:54 Couple things to check if what you are seeing is entire original FlowFile getting routed to the "Original" and "Unmatched" relationships: 1. RouteText processor configuration. If i understand your use case correctly, it should be configured like this: 2. I noticed you sample data has leading and trailing whitespace so make sure processor is configured to ignore those. 3. Since you intent is produce a new FlowFile with only the lines matching the regex, make sure you set the above Routing Strategy. 4. Make sure the correct matching strategy is selected. Should be what I have above. 5. Click on the "+" to add a new dynamic property for your regex, The property name becomes a new relationship on the processor where your matching lines will be routed. 6. Since you are evaluating the source FlowFile content line-by-line, make sure your regex does not have a line return at the end of it. Correct: Incorrect (notice the line 2 which indicates a line return at end of regex): When I ran a little test flow using your sample data and regex, I got the desired results: The "lines" relationship has one new FlowFile with content of only the 3 matching lines The "unmatched" relationship contains a new FlowFile with content containing all the unmatched lines. The "original" relationship contains the original FlowFile that was processed by this processor. If you don't care about the original or unmatched FlowFiles, you can simply auto-terminate those relationships instead of routing them out of the processor in connections as I did above. Hope this helps, Matt
... View more
01-12-2021
05:49 AM
@Gcima009 It might be helpful to have more context around your issue. What action and/or NiFi component are you using when the exception occurs/ What NiFi version are you using? Can you share the entire error log from the nifi-app.log? Thanks, Matt
... View more
01-12-2021
05:34 AM
@Raj123 I am not a java developer, but NiFi is written in Java and the source code is open sourced. You would need to look at the code for the CSVReader to see how it handles AVRO schema inference. Sorry that I cannot be of more help in this specific query.
... View more
01-12-2021
05:21 AM
@CristoE Since this question already has an accepted solution and is specific to DISTCP replacement for HDFS, It would be much better to start an entirely new question in the community. You can always add a link to this question solution as reference in your new question post. You would get more visibility that way and we would not dilute the answer to this question with suggestions related to ADLS rather then HDFS.
... View more
01-08-2021
01:12 PM
@Raj123 NiFi offers many "record" based processors that support various record readers and writers. Those record readers have the ability of inferring an avro schema from the incoming record and the record writer can be configured to write the inferred schema to an attribute on the outgoing FlowFile. There is no specific infer schema processor for CSV source data. That would require a custom processor (perhaps one that utilizes the existing CSVReader controller service. Typically you would use a record based processor to manipulate, split, validate your record, so I am not the value or use case fro only wanting to infer the avro schema. That being said, you can get that inferred schema for example by simply using the "ConvertRecord" processor with a "CSVReader" (configured to infer schema) and a "CSVRecordSetWriter" (configured to "set avro.schema' attribute"). The written FlowFile will be same as source FlowFile but it will have an additional "avro.schema" attribute on the FlowFile containing the inferred avro schema. ConvertRecord: CSVReader: CSVRecordSetWriter: Hope this helps, Matt
... View more
01-05-2021
11:19 AM
2 Kudos
@garoosy You should look in to using the "ExecuteSQLRecord" instead of "ExecuteSQL" for large volume data. To be efficient here you would have many records in a single FlowFile. Right now you have a single record per each FlowFile which is not going to be very efficient. The only way for "ExecuteSQL" to handle multiple FlowFile executions in a single connection is if the SQL statement used in every FlowFile is identical. In order to do that the unique values would need to come from FlowFile attributes. You may find these post helpful: https://community.cloudera.com/t5/Support-Questions/Nifi-ExectueSQL-how-to-force-a-parameter-to-be-a-string/td-p/240117 https://stackoverflow.com/questions/63330790/using-nifi-executesqlrecord-with-parameterized-sql-statements If you have threads that never seem to complete (will see small number in upper right corner of processor (2)), it is best to get a series of thread dumps (4 - 6) to verify thread is not progressing. Then you have to determine if what the thread is waiting on. Did you try setting a "Max Wait Time" on the processor? It defaults to 0 which means it would wait forever. Hope this helps, Matt
... View more