Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hardware recommendation for HDF/Nifi cluster

avatar
Expert Contributor

Hi All,

- This seems like an obvious question, so forgive me if it is redundant: What hardware configurations would be suitable for setting-up HDF 2.x on VMs for 8 node cluster?

- I found an old document which does help: link

- It seems like Nifi might need more cores vs RAM. My current setup of 12GB/node and 6cores/Node is not working (note: Master has 6GM RAM, which seems like a bottlenect).

- After going-through the link, I am thinking of following , but not sure if this is optimal:

24 cores vs 20GB RAM vs 250-500GB Disk.

Does it seems like an optimal configurations (consider the ratio, more cores vs more RAM?)? To give more context, I currently don't have any specific throughput requirements, and using Nifi for some batch jobs/log processing etc, however I do want to have a stable cluster setup which we could also use in future if the use increases.

Thanks

Obaid

1 ACCEPTED SOLUTION

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

Great, Thanks for your response,

Do you think that there is a relationship with Cores and RAM, meaning if you have X cores then you should have X+ RAM etc, is there any dependency or good practice? We can think of minimum requirements, assuming we will be running a lot of light-weight flows (batch, scheduled).

I mean, more cores will let us run more flows, so just thinking if 32GB RAM will be enough for 20cores if I go for HDF2.x.x. Say in the future all 20cores become busy, then would RAM be an issue?

Thanks

avatar
Super Mentor
@Obaid Salikeen

There is no direct correlation between CPU and heap memory usage. Heap usage is more processor and flow implementation specific. Processors that do things like splitting or merging of FlowFiles can end up using more heap. FlowFile Attributes live in heap memory. NiFi does swap FlowFile attribute to disk per connection based on FlowFile queue count. Default of 20,000 will trigger swapping to start on a connection. But there is no sap threshold based on FlowFile attribute map size. If a user creates large attribute values to FlowFile Attributes, that FlowFile heap usage is going to be higher. You see this isn scenarios where large parts of the FlowFile content is extracted to a FlowFile attribute. So when it comes to heap/memory usage, it comes down to flow design more then any correlation to the number of CPUs.

Thanks,
Matt