Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to scale vertically for a HDF instance

Explorer

When we run HDF on a single machine , does all the data flow build on that machine run under a single JVM?

I did see in Nifi documents which talks about how how you can control the spill the data from JVM to hardisk. But is there option to run via multiple JVM say one for each flow. Also How big of a JVM size you usually have for a edge node.

1 ACCEPTED SOLUTION

When you run HDF on a single machine it is a single JVM process. That instance has three internal repositories (content, flow file, and provenance) which are what can be controlled in the configuration as to how much of the repositories is retained on disk. For high performance it is best to have each of the repositories using a separate disk.

One instance can have many logical flows which can be optionally grouped inside processors groups. There can be many disconnected logical flows with in one instance. There are discussions for future capabilities where logical flows could be restricted to only certain groups of users.

The default for memory requirements for NiFi out-of-the-box are 512MB, so at the moment that would probably be the starting point for an edge node.

View solution in original post

3 REPLIES 3

When you run HDF on a single machine it is a single JVM process. That instance has three internal repositories (content, flow file, and provenance) which are what can be controlled in the configuration as to how much of the repositories is retained on disk. For high performance it is best to have each of the repositories using a separate disk.

One instance can have many logical flows which can be optionally grouped inside processors groups. There can be many disconnected logical flows with in one instance. There are discussions for future capabilities where logical flows could be restricted to only certain groups of users.

The default for memory requirements for NiFi out-of-the-box are 512MB, so at the moment that would probably be the starting point for an edge node.

Explorer

Thanks @bbende!! So we do not recommend scaling Nifi vertical by increasing the heap size for the JVM to really large size?

We definitely do recommend increasing the heap appropriately for the given use case. I was focused on some of the other aspects and forgot to mention that 🙂

In conf/bootstrap.conf there are the following default settings which can be increased:

# JVM memory settings

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

It is hard to recommend a general heap size for all use cases, but anywhere from 512MB up to 8GB is common.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.