Member since
07-30-2019
3387
Posts
1617
Kudos Received
999
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 125 | 11-05-2025 11:01 AM | |
| 375 | 10-20-2025 06:29 AM | |
| 515 | 10-10-2025 08:03 AM | |
| 358 | 10-08-2025 10:52 AM | |
| 394 | 10-08-2025 10:36 AM |
02-23-2017
02:25 PM
9 Kudos
NiFi works with FlowFiles. Every FlowFile that exists consists of two parts, FlowFile content and FlowFile Attributes. While the FlowFile's content lives on disk in the content repository, NiFi holds the "majority" of the FlowFile attribute data in the configured JVM heap memory space. I say "majority" because NiFi does swapping of Attributes to disk on any queue that contains over 20,000 FlowFiles (default, but can be changed in the nifi.properties). Once your NiFi is reporting OutOfMemory (OOM) Errors, there is no corrective action other then restarting NiFi. If changes are not made to your NiFi or dataflow, you are surely going to encounter this issue again and again. The default configuration for JVM heap in NiFi is only 512 MB. This value is set in the nifi-bootstrap.conf file. # JVM memory settings
java.arg.2=-Xms512m
java.arg.3=-Xmx512m While the default may work for some dataflow, they are going to be undersized for others.
Simply increasing these values till you stop seeing (OOM) error should not be your immediate go to solution. Very large heap sizes could also have adverse impacts on your dataflow as well. Garbage collection will take much longer to run with very large heap sizes. While garbage collections occurs, it is essentially a stop the world event. This amount to dataflow stoppage for the length time it takes for that to complete. I am not saying that you should never set large heap sizes because sometimes that is really necessary; however, you should evaluate all other options first.... NiFi and FlowFile attribute swapping: NiFi already has a built in mechanism to help reduce the overall heap footprint. The mechanism swaps FlowFiles attributes to disk when a given connection's queue exceeds the configured threshold. These setting are found in the nifi.properties file: nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
nifi.queue.swap.threshold=20000
nifi.swap.in.period=5 sec
nifi.swap.in.threads=1
nifi.swap.out.period=5 sec
nifi.swap.out.threads=4 Swapping however will not help if your dataflow is so large that queues are how everywhere, but still have not exceeded the threshold for swapping. Anytime you decrease the swap threshold, more swapping can occur which may result in some throughput performance. So here are some other things to check for... So some common reason for running out of heap memory include: 1. High volume dataflow with lots of FlowFiles active any any given time across your dataflow. (Increase configured nifi heap size in bootstrap.conf to resolve)
2. Creating a large number of Attributes on every FlowFile. More Attributes equals more heap usage per FlowFile. Avoid creating unused/unnecessary Attributes on FlowFiles. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 3. Writing large values to FlowFile Attributes. Extracting large amounts of content and writing it to an attribute on a FlowFile will result in high heap usage. Try to avoid creating large attributes when possible. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 4. Using the MergeContent processor to merge a very large number of FlowFiles. NiFi can not merge FlowFiles that are swapped, so all these FlowFile's attributes must be in heap when the merge occurs. If merging a very large number of FlowFiles is needed, try using two MergeContent processors in series with one another. Have first merge a max of 20,000 FlowFiles and the second then merge those 10,000 FlowFile files in to even larger bundles. (Increase configured nifi heap size in bootstrap.conf also help) 5. Using the SplitText processor to split one File in to a very large number of FlowFiles. Swapping of a large connection queue will not occur until after the queue has exceeded swapping threshold. The SplitTEXT processor will create all the split FiLowFiles before committing them to the success relationship. Most commonly seen when SpitText is used to split a large incoming FlowFile by every line. It is possible to run out of heap memory before all the splits can be created. Try using two SplitText processors in series. Have the first split the incoming FlowFiles in to large chunks and the second split them down even further. (Increase configured nifi heap size in bootstrap.conf also help) Note: There are additional processors that can be used for splitting and joining large numbers of FlowFiles, so the same approach as above should be followed for those as well. I only specifically commented on the above since they are more commonly seen being used to deal with very large numbers of FlowFiles.
... View more
Labels:
02-23-2017
01:37 PM
1 Kudo
@mayki wogno Every FlowFile that exists consists of two parts, FlowFile content and FlowFile Attributes. While the FlowFile's content lives on disk in the content repository, NiFi holds the "majority" of the FlowFile attribute data in the configured JVM heap memory space. I say "majority" because NiFi does swapping of Attributes to disk on any queue that contains over 20,000 FlowFiles (default, but can be changed in the nifi.properties). So some common reason for running out of heap memory include: 1. High volume dataflow with lots of FlowFiles active any any given time across your dataflow. (Increase configured nifi heap size in bootstrap.conf to resolve) 2. Creating a large number of Attributes on every FlowFile. More Attributes equals more heap usage per FlowFile. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 3. Writing large values to FlowFile Attributes. Extracting large amounts of content and writing it to an attribute on a FlowFile will result in high heap usage. Try to avoid creating large attributes when possible. (Increase configured nifi heap size in bootstrap.conf to resolve and/or reduce the configured swap threshold) 4. Using the MergeContent processor to merge a very large number of FlowFiles. NiFi can not merge FlowFiles that are swapped, so all these FlowFile's attributes must be in heap when the merge occurs. If merging a very large number of FlowFiles is needed, try using two MergeContent processors in series with one another. Have first merge a max of 10,000 FlowFiles and the second then merge those 20,000 FlowFile files in to even larger bundles. (Increase configured nifi heap size in bootstrap.conf also help) 5. Using the SplitText processor to split one File in to a very large number of FlowFiles. Swapping of a large connection queue will not occur until after the queue has exceeded swapping threshold. The SplitTEXT processor will create all the split FiLowFiles before committing them to the success relationship. Most commonly seen when SpitText is used to split a large incoming FlowFile by every line. It is possible to run out of heap memory before all the splits can be created. Try using two SplitText processors in series. Have the first split the incoming FlowFiles in to large chunks and the second split them down even further. (Increase configured nifi heap size in bootstrap.conf also help) Thanks, Matt
... View more
02-23-2017
01:12 PM
1 Kudo
@Ramakrishnan V You will need to use the following curl command to obtain a token for your LDAP user:
curl 'https://<hostname>:<port>/nifi-api/access/token' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' --data 'username=admin&password=admin' --compressed --insecure Once you have your token you will need to pass that token as the bearer of all subsequent curl command you execute against the NiFi api by adding teh following to your curl commads: -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJjbj1hZG1pbixkYz1leGFtcGxlLGRjPW9yZyIsImlzcyI6IkxkYXBQcm92aWRlciIsIm
F1ZCI6IkxkYXBQcm92aWRlciIsInByZWZlcnJlZF91c2VybmFtZSI6ImFkbWluIiwia2lkIjoxLCJleHAiOjE0ODcxNDM2OTEs
ImlhdCI6MTQ4NzEwMDQ5MX0.GwwJ0Yz4_KXUAMNIH500jw8YcIk3e6ZdcT3LCrrkHjc' The odd string above is an example of the token you will get back from the first command. Thanks, Matt
... View more
02-22-2017
10:08 PM
1 Kudo
@Joe Petro Yes this is very doable... NiFi automatically creates a FlowFile Attribute called "filename" on every FlowFile that is created. You can use this existing attribute to specify teh target HDFS directory: Of course you will want to modify the above for the complete target path. Thanks, Matt
... View more
02-22-2017
01:54 PM
@mayki wogno No problem... If one of the answers help drive you to a solution to your question, please accept that answer to help drive this community forward.
... View more
02-22-2017
01:36 PM
2 Kudos
@Pradhuman Gupta Which "Protocol" are you using in your PutSplunk processor.
There is no assurance or delivery if you are using UDP. With TCP protocol there is confirmed delivery. You could use NiFi's provenance to track the FlowFile's processed by the PutSplunk processor. This will allow you to get the details on FlowFiles that have "SEND" provenance events associated to them. Thanks, Matt
... View more
02-21-2017
06:49 PM
2 Kudos
@Raj B 1. The main intent of NiFi Provenance is for data governance. The ability to look back at the life of a FlowFile. It can tell you where a FlowFile originated, from what parent FlowFile it was part of, how many parents FlowFiles where used to create it, What changes were made to it, where it was sent, when it was terminated from NiFi, etc... NiFi Provenance also provides a means to view or replay FlowFile's that are no longer anywhere in your dataflow (Provided the FlowFiles content still exists in the content repositories archive) at any point in your dataflow. Examples: - Some downstream system expected to receive file "ABC" over the weekend from NiFi. You can use NiFi's data provenance to see exactly when file "ABC" was received by NiFi and exactly what NiFi did to file "ABC" as it traversed your dataflows. - A FlowFile "XYZ" was expected to route through your dataflow to some destination "G". Upon searching Provenance it was discovered "XYZ" was routed down the wrong path. You could correct you dataflow routing issues and use data provenance to replay "XYZ" just prior to the dataflow correction. 2. NiFi's Provenance repository retains all Provenance events generated via your dataflow up until either retention time or max disk usage properties are met. When either of those conditions are met, the oldest provenance events are deleted first. There is no way to selectively decide which provenance events are retained in the repository. Using the 3. The Provenance API provides a means for running queries directly against the Provenance data stored local to a particular NiF instance. The SiteToSiteProvenanceReportingTask provides a way of sending provenance events to another system for perhaps longer term storage. Since provenance events do not contain any FlowFile content, only provenance events stored locally within a NiFi instance can be used to view or replay any content. Thanks, Matt
... View more
02-21-2017
04:37 PM
@mayki wogno One thing you could do is set "FlowFile Expiration" on the connection containing the "merged" relationship. And set the "Available Prioritizers" to " Newest FlowFileFirstPrioritizer". FlowFile expiration is measured against the age of the FlowFile (from creation time to now) and not how long it has been in a particular connection. If the FlowFile age exceeds this configured value, it is purged from the queue.
... View more
02-21-2017
03:44 PM
1 Kudo
@Andy Liang The ConsumeJMS and PublishJMS processors can be used with IBM MQ. They require you to setup an "JMSConnectionFactoryProvider" controller service to facilitate that IBM MQ connection. You will need to download the IBM MQ Client library on to the server where your NiFi is running. Matt
... View more
02-21-2017
03:09 PM
1 Kudo
@mayki wogno You can reduce or even eliminate the WARN messages by placing a MergeContent processor between your first and second DeleteHDFS processors that merges using "path" as the value to the "Correlation Attribute Name" property. The resulting merged FlowFile(s) would still have the same "path" that would be used by the second DeleteHDFS to remove your directory. Matt
... View more