Support Questions

Krish98 · ‎10-09-2024

Hi Team,

Currently i am using 5 node cluster of NIFI - 1.27 version.
node have 47 Gb RAM

My NIFI heap memory level keep on increasing and crashing by reaching 150 hours.

Frequency of data is around 100 Gb in a day.

I am using variety of processors (Kafka, split and merge , distributed cache processors,Attribute level processors, hive3streaming processor,(select and put)Execute SQL (postgre,SSIS),Kudu processor etc).

Please find nifi properties below:

# JVM memory settings

java.arg.2=-Xms16g

java.arg.3=-Xmx20g

java.arg.7=-XX:ReservedCodeCacheSize=512m

java.arg.9=-XX:+UseCodeCacheFlushing

java.arg.23=-XX:ParallelGCThreads=8

java.arg.24=-XX:ConcGCThreads=4

java.arg.25=-XX:G1ReservePercent=10

java.arg.26=-XX:+UseStringDeduplication

java.arg.27=-XX:InitiatingHeapOccupancyPercent=25

java.arg.28=-XX:MaxGCPauseMillis=200

java.arg.40=-XX:SurvivorRatio=8

java.arg.41=-XX:NewRatio=3

java.arg.42=-Xmn6g

java.arg.13=-XX:+UseG1GC

nifi.content.repository.archive.max.retention.period=7 hours

nifi.content.repository.archive.max.usage.percentage=50%

nifi.provenance.repository.max.storage.time=30 hours

nifi.provenance.repository.max.storage.size=10 GB

nifi.provenance.repository.rollover.time=10 mins

nifi.provenance.repository.rollover.size=100 MB

nifi.provenance.repository.query.threads=2

nifi.provenance.repository.index.threads=2

nifi.queue.swap.threshold=10000

Tagging @SAMSAL @MattWho for quick response

Thanks,
Krish

MattWho · ‎10-10-2024

@Krish98

Most NiFi Heap memory issues are directly related to dataflow design. The Apache NiFi documentation for the individual components generally does a good job with reporting "System Resource Considerations". So the first step would be to review the documentation for the components you are using to see which list "MEMORY" as system resource consideration.

Example:
SplitContent 1.27.0

Then sharing your configuration of those components might help with providing suggestions that may help you.

- Split and Merge processor depending on how they are configured can utilize a lot of heap.
- Distributed Map cache also resides in HEAP and can contribute to to significant heap usage depending on configuration and the size of what is being written to it.

Beyond components:
- NiFi loads the entire flow.json.gz (uncompressed it to heap memory). This includes any NiFi Templates (Deprecated in Apache NiFi 1.x and removed in newer Apache NiFi 2.x version). Templates should no longer be used. Any templates created which are listed in the NiFi templates UI should be downloaded so they are stored outside of NiFi and then deleted from NiFi to reduce heap usage.
- NiFi FlowFiles - NiFi FlowFlowFiles are what transition between components via connections in your dataflow(s). A FlowFile consists of two parts. FlowFile content stored in content claims in the content_repository and FlowFile metadata/attributes held in heap memory and persisted to flowfile_repository. So if you are creating a lot of FlowFile attributes on your FlowFiles or creating very large FlowFile attributes (like extract content to an attribute), that can result in high heap usage. A connection does have a default threshold at which time a swap file is created to reduce heap usage. Swap files are created with 10,000 FlowFiles in each swap file. The first swap file would not be created until a connection on a specific node reached 20,000 at which point 10,000 would be moved to a swap file and the 10,000 highest priority would remain in heap. The default "back pressure object threshold" on a connection is 10,000 meaning that with defaults no connection would ever create a swap file.

Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

Cloudera Community

Support Questions

NIFI-Heap Accumulation Issue