Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Dataflow to check JVM usage

avatar

Hi,

Iam new to Nifi and would like to explore the JVM heap size.NiFi holds the majority of the FlowFile attribute data in the configured JVM heap memory space.Hence I would like to have a sample data flow to see how the JVM heap size is crossed resulting in OOM issues.How can we setup a simple dataflow which can produce increased flow of flowfile attributes resulting in OOM?

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Gillu Varghese

In order to build a dataflow like this you just need to know which FlowFile's attributes/metadata are held in heap versus which FlowFile's attributes are swapped to disk. To understand that take a look at this HCC article:
-
https://community.hortonworks.com/articles/184990/dissecting-the-nifi-connection-heap-usage-and-perf...

-

So any connection with queued data with have some amount of heap footprint.

You can use an UpdateAttribute processor to add additional Attributes to a FlowFile(s)

-

But you also have to understand that heap space is not only used by queued FlowFiles.

-

Thanks,

Matt

View solution in original post

7 REPLIES 7

avatar

@Matt Clarke can you please help me out

avatar
Super Mentor

@Gillu Varghese

In order to build a dataflow like this you just need to know which FlowFile's attributes/metadata are held in heap versus which FlowFile's attributes are swapped to disk. To understand that take a look at this HCC article:
-
https://community.hortonworks.com/articles/184990/dissecting-the-nifi-connection-heap-usage-and-perf...

-

So any connection with queued data with have some amount of heap footprint.

You can use an UpdateAttribute processor to add additional Attributes to a FlowFile(s)

-

But you also have to understand that heap space is not only used by queued FlowFiles.

-

Thanks,

Matt

avatar
Super Mentor
@Gillu Varghese

*** HCC forum tip: Avoid responding to answers in new answers. Instead respond to an existing answer via a comment.

-

You would not want to decrease the swap threshold. You would increase this value so that more FlowFiles are held in heap on each connection before being swapped out.

-

Heap space is going to be used by processors while processing a FlowFile. The amount of heap used depends on the processor and what it is doing. Any processor that produces a "new" FlowFile from incoming FlowFile or batch of Flowfiles will use heap space while generating that new FlowFile or batch of FlowFiles (Processors like MergeContent and SplitText are good examples of both)

-

I am not sure what you are trying to measure here? Is the goal simply to keep loading more in to Heap until you encounter an OOM error in NiFi? Considering that heap usage is not purely a measure of queued (non-swapped) flowfiles, I am not sure the value in such a test. Since NiFi runs a single JVM instance, there is no way within NiFi to list out what is exactly using what amount of the allocated heap space.

-

As far as loading up the size of attributes. You could feed in file with content and use the ExtractText processor to extract that content to a FlowFile attribute. Depending the size of the FlowFile Content loaded, this could add up to a lot of heap usage per FlowFile.

-

Thanks,
Matt

avatar
@Matt Clarke

Thankyou for the information..I wanted to monitor the utilization by keeping the heap space fixed and progressively increasing the flowfile count and then increasing the flowfile load.

avatar
Super Mentor

@Gillu Varghese

Keep in mind how JVM heap space works. At a very high level, objects in heap are not cleared out when no longer used. So a FlowFile's attributes while queued will exist in heap, when that FlowFile's no longer exists in flow (reached end of flow for example) that heap space is likely to still be occupied. It is the job of Java Garbage Collection (GC) to free unused heap space. So once heap utilization is high enough that free space is needed by the JVM, GC will run to create that free space.
-
So even after running a heavy flow and no FlowFiles are left anywhere in your dataflows, you may still observe high reported heap usage. That is normal and expected.

-

Thanks,

Matt

-

If you found this answer addressed your original question, please take a moment to login and click "accept".

avatar
@Matt Clarke

Thanks Matt for the explanation.It was very much informative.One more query is that if I use UpdateAttribute processor to add attributes,then i have to add lots of them to increase the heap size or decreasing the swap threshold would suffice?Please correct me if Iam wrong and can you also explain me what other than queued flowfiles uses heap space.

avatar

@Matt Clarke

Is there any way to load flowfile attributes heavily ?