Created 10-09-2024 09:02 PM
Hi Team,
Currently i am using 5 node cluster of NIFI - 1.27 version.
node have 47 Gb RAM
My NIFI heap memory level keep on increasing and crashing by reaching 150 hours.
Frequency of data is around 100 Gb in a day.
I am using variety of processors (Kafka, split and merge , distributed cache processors,Attribute level processors, hive3streaming processor,(select and put)Execute SQL (postgre,SSIS),Kudu processor etc).
Please find nifi properties below:
# JVM memory settings
java.arg.2=-Xms16g
java.arg.3=-Xmx20g
java.arg.7=-XX:ReservedCodeCacheSize=512m
java.arg.9=-XX:+UseCodeCacheFlushing
java.arg.23=-XX:ParallelGCThreads=8
java.arg.24=-XX:ConcGCThreads=4
java.arg.25=-XX:G1ReservePercent=10
java.arg.26=-XX:+UseStringDeduplication
java.arg.27=-XX:InitiatingHeapOccupancyPercent=25
java.arg.28=-XX:MaxGCPauseMillis=200
java.arg.40=-XX:SurvivorRatio=8
java.arg.41=-XX:NewRatio=3
java.arg.42=-Xmn6g
java.arg.13=-XX:+UseG1GC
nifi.content.repository.archive.max.retention.period=7 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.provenance.repository.max.storage.time=30 hours
nifi.provenance.repository.max.storage.size=10 GB
nifi.provenance.repository.rollover.time=10 mins
nifi.provenance.repository.rollover.size=100 MB
nifi.provenance.repository.query.threads=2
nifi.provenance.repository.index.threads=2
nifi.queue.swap.threshold=10000
Tagging @SAMSAL @MattWho for quick response
Thanks,
Krish
Created 10-10-2024 09:54 AM
@Krish98
Most NiFi Heap memory issues are directly related to dataflow design. The Apache NiFi documentation for the individual components generally does a good job with reporting "System Resource Considerations". So the first step would be to review the documentation for the components you are using to see which list "MEMORY" as system resource consideration.
Example:
SplitContent 1.27.0
Then sharing your configuration of those components might help with providing suggestions that may help you.
- Split and Merge processor depending on how they are configured can utilize a lot of heap.
- Distributed Map cache also resides in HEAP and can contribute to to significant heap usage depending on configuration and the size of what is being written to it.
Beyond components:
- NiFi loads the entire flow.json.gz (uncompressed it to heap memory). This includes any NiFi Templates (Deprecated in Apache NiFi 1.x and removed in newer Apache NiFi 2.x version). Templates should no longer be used. Any templates created which are listed in the NiFi templates UI should be downloaded so they are stored outside of NiFi and then deleted from NiFi to reduce heap usage.
- NiFi FlowFiles - NiFi FlowFlowFiles are what transition between components via connections in your dataflow(s). A FlowFile consists of two parts. FlowFile content stored in content claims in the content_repository and FlowFile metadata/attributes held in heap memory and persisted to flowfile_repository. So if you are creating a lot of FlowFile attributes on your FlowFiles or creating very large FlowFile attributes (like extract content to an attribute), that can result in high heap usage. A connection does have a default threshold at which time a swap file is created to reduce heap usage. Swap files are created with 10,000 FlowFiles in each swap file. The first swap file would not be created until a connection on a specific node reached 20,000 at which point 10,000 would be moved to a swap file and the 10,000 highest priority would remain in heap. The default "back pressure object threshold" on a connection is 10,000 meaning that with defaults no connection would ever create a swap file.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt