Support Questions

Find answers, ask questions, and share your expertise

How to scale vertically for a HDF instance

avatar
Rising Star

When we run HDF on a single machine , does all the data flow build on that machine run under a single JVM?

I did see in Nifi documents which talks about how how you can control the spill the data from JVM to hardisk. But is there option to run via multiple JVM say one for each flow. Also How big of a JVM size you usually have for a edge node.

1 ACCEPTED SOLUTION

avatar
Master Guru

When you run HDF on a single machine it is a single JVM process. That instance has three internal repositories (content, flow file, and provenance) which are what can be controlled in the configuration as to how much of the repositories is retained on disk. For high performance it is best to have each of the repositories using a separate disk.

One instance can have many logical flows which can be optionally grouped inside processors groups. There can be many disconnected logical flows with in one instance. There are discussions for future capabilities where logical flows could be restricted to only certain groups of users.

The default for memory requirements for NiFi out-of-the-box are 512MB, so at the moment that would probably be the starting point for an edge node.

View solution in original post

3 REPLIES 3

avatar
Master Guru

When you run HDF on a single machine it is a single JVM process. That instance has three internal repositories (content, flow file, and provenance) which are what can be controlled in the configuration as to how much of the repositories is retained on disk. For high performance it is best to have each of the repositories using a separate disk.

One instance can have many logical flows which can be optionally grouped inside processors groups. There can be many disconnected logical flows with in one instance. There are discussions for future capabilities where logical flows could be restricted to only certain groups of users.

The default for memory requirements for NiFi out-of-the-box are 512MB, so at the moment that would probably be the starting point for an edge node.

avatar
Rising Star

Thanks @bbende!! So we do not recommend scaling Nifi vertical by increasing the heap size for the JVM to really large size?

avatar
Master Guru

We definitely do recommend increasing the heap appropriately for the given use case. I was focused on some of the other aspects and forgot to mention that 🙂

In conf/bootstrap.conf there are the following default settings which can be increased:

# JVM memory settings

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

It is hard to recommend a general heap size for all use cases, but anywhere from 512MB up to 8GB is common.