Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to scale vertically for a HDF instance

Solved Go to solution

How to scale vertically for a HDF instance

New Contributor

When we run HDF on a single machine , does all the data flow build on that machine run under a single JVM?

I did see in Nifi documents which talks about how how you can control the spill the data from JVM to hardisk. But is there option to run via multiple JVM say one for each flow. Also How big of a JVM size you usually have for a edge node.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to scale vertically for a HDF instance

When you run HDF on a single machine it is a single JVM process. That instance has three internal repositories (content, flow file, and provenance) which are what can be controlled in the configuration as to how much of the repositories is retained on disk. For high performance it is best to have each of the repositories using a separate disk.

One instance can have many logical flows which can be optionally grouped inside processors groups. There can be many disconnected logical flows with in one instance. There are discussions for future capabilities where logical flows could be restricted to only certain groups of users.

The default for memory requirements for NiFi out-of-the-box are 512MB, so at the moment that would probably be the starting point for an edge node.

3 REPLIES 3

Re: How to scale vertically for a HDF instance

When you run HDF on a single machine it is a single JVM process. That instance has three internal repositories (content, flow file, and provenance) which are what can be controlled in the configuration as to how much of the repositories is retained on disk. For high performance it is best to have each of the repositories using a separate disk.

One instance can have many logical flows which can be optionally grouped inside processors groups. There can be many disconnected logical flows with in one instance. There are discussions for future capabilities where logical flows could be restricted to only certain groups of users.

The default for memory requirements for NiFi out-of-the-box are 512MB, so at the moment that would probably be the starting point for an edge node.

Highlighted

Re: How to scale vertically for a HDF instance

New Contributor

Thanks @bbende!! So we do not recommend scaling Nifi vertical by increasing the heap size for the JVM to really large size?

Re: How to scale vertically for a HDF instance

We definitely do recommend increasing the heap appropriately for the given use case. I was focused on some of the other aspects and forgot to mention that :)

In conf/bootstrap.conf there are the following default settings which can be increased:

# JVM memory settings

java.arg.2=-Xms512m

java.arg.3=-Xmx512m

It is hard to recommend a general heap size for all use cases, but anywhere from 512MB up to 8GB is common.

Don't have an account?
Coming from Hortonworks? Activate your account here