Support Questions
Find answers, ask questions, and share your expertise

Uplift estimation of hadoop cluster


Uplift estimation of hadoop cluster


Hi All,

We are in the process of assessing what is required to scale/uplift/upsize the our current Hadoop cluster to cater to a greater number of concurrent users – we’re looking for your help to understand and review the assumptions, approach, and gain broader practice input from the experts on the considerations .We are using Zeppelin from which user would be running Queries using hiveserver2 . Our current cluster size , specification and few considerations regarding the use cases are listed below.

  • 1)8 node cluster – 2 master node + 6 slave nodes (8 vcpu+1 TB disk space+64 GB RAM)
  • Total data size is around 1.5 TB
  • Replication factor is 3
  • Initially there are 50 users on Zeppelin but going forward , there would be 150 users from which 15-20 concurrent users.

In order to maintain a good performance , we want to evaluate the upsize estimates to support parallel queries in cluster through 20+ concurrent user on Zeppelin.


Re: Uplift estimation of hadoop cluster

New Contributor

I'm not offering a direct answer, but a couple of things to think about.

If your total data size is only 1.5TB, then it can all be kept in memory across a small number of nodes. Since its in memory, you won't need to use HDFS, you can use S3 or EBS. Use 8 nodes at 256G of memory each, then add vcores if you need more performance.

Don't have an account?