New Contributor
Posts: 3
Registered: ‎02-01-2017
Accepted Solution

Minimum hardware configuration for a small cluster with small data

[ Edited ]

We have an application that is backed by a 'small' database (~1TB of data in Oracle SE). The database is growing slowly at the moment (about 1GB per month), however that will increase in the future. A lot of this data is historical / cold. We are considering using a product called Gluent to help us offload data to a Hadoop cluster, however Hadoop is completely new to us and initially it seems like overkill. That said, I can see many advantages for having a Hadoop cluster as a 'data lake' for both the database data and various data related to the application that is not stored in the database at the moment (e.g. data files that currently reside on the application server). Based on my (very limited) understanding of Gluent, for our usage the majority of the processing would still be in Oracle, with only the occasional queries to the 'cold' data in Hadoop, so responsiveness is not a very high priority. i.e. I believe we might be able to get away with 'low end' specs.


My question is: what are the minimum hardware specs for a small cluster to fit our scenario? I'm sure there's an element of "it depends" in the answer, but I guess I want to verify that it is feasible to start a 'production' cluster with minimal resources (e.g. balanced nodes w/ ~4-6 CPU, ~16-32GB RAM, and ~500GB - 1TB disks), with the ability to scale up in the years to come.

Cloudera Employee
Posts: 43
Registered: ‎10-07-2016

Re: Minimum hardware configuration for a small cluster with small data



Josh here, from Cloudera. Thanks for reaching out on this.


As far as verifying whether or not your outlined configuration would work, the short answer would be perhaps.


You might have already seen it, but I'll point to this blog post as a reference. It's a good read, and includes a matrix for deciding the specs for your cluster's nodes. If you look, you'll see the configuration you're proposing is in the neighborhood of a "Light Processing Configuration", but for every other configuration listed, it starts to fall short. As long as you don't make a fully stacked cluster with every service imaginable(it seems like you don't intend on doing that), the "Light Processing" config could suffice. You can also check out this other community post to get a better idea of how speccing your cluster could pan out in terms of how many nodes you would want. 


So, in short, perhaps. Let me know if this helps or if you have any other questions.




New Contributor
Posts: 3
Registered: ‎02-01-2017

Re: Minimum hardware configuration for a small cluster with small data

Thanks Josh! Yep, I did read that excellent blog post. I was mostly interested to hear if anyone else has been in a similar situation, and how they resolved it. We'll proceed as best we can and I'll report back here in the future to let everyone know how it goes.