Support Questions

Find answers, ask questions, and share your expertise

Sizing for Master/Edges servers

avatar

I'm helping a prospect expansion from current 6 nodes hadoop cluster to plans of more than 1PB and hundred nodes. I gave him some hints:

- master and edges nodes running in virtual environment (as they do not require high I/O and virtual environment can increase availability)

- knox as security perimeter gateway

- dedicated database nodes with high availability

I need help with recommended sizing and notes for items below:

- Master nodes, what is recommended RAM for master? Prospect asked me to consider that virtualized usually runs on machines with 512GB of RAM and usually they don't allocate more than 64GB virtual hosts.

- Edges nodes

- Knox, do we have any sizing for Knox?

- Database servers, do we have any sizing for dedicated database servers(for metadata: Ambari, Hue, Hive Metastore, Oozie, etc)?

Thanks.

Guilherme.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Guilherme Braccialli

Please see this

Master nodes, what is recommended RAM for master? Prospect asked me to consider that virtualized usually runs on machines with 512GB of RAM and usually they don't allocate more than 64GB virtual hosts.

Comment: 256GB is good start - Each master node

If you are referring to 512GB in each physical node then 2 VM based on 1 bare metal

- Edges nodes

- Knox, do we have any sizing for Knox?

Comment: It' light weight instance so 64gb is good number (Depends how much traffic coming to the knox gateway)

- Database servers, do we have any sizing for dedicated database servers(for metadata: Ambari, Hue, Hive Metastore, Oozie, etc)?

Comment: Dedicated instance for DB is a good practice.

Memory: 128GB is a good start ( Mysql, Postgres, Oracle) ( in prod, its very important to have HA for DB)

CPU - dual 8 core or quad core if possible ( Assuming large cluster)

View solution in original post

3 REPLIES 3

avatar

Could you please provide some more details about the services that will be deployed and used? Hive? HBase? Spark?

avatar

@Jonas Straub initially only hive, but in the future Hbase, Solr and Spark also. Prospect does not have all the details yet like number of users, amount of data, etc. So far overall guidelines and basic calculations that lead to number of hosts will help a lot. Thanks.

avatar
Master Mentor

@Guilherme Braccialli

Please see this

Master nodes, what is recommended RAM for master? Prospect asked me to consider that virtualized usually runs on machines with 512GB of RAM and usually they don't allocate more than 64GB virtual hosts.

Comment: 256GB is good start - Each master node

If you are referring to 512GB in each physical node then 2 VM based on 1 bare metal

- Edges nodes

- Knox, do we have any sizing for Knox?

Comment: It' light weight instance so 64gb is good number (Depends how much traffic coming to the knox gateway)

- Database servers, do we have any sizing for dedicated database servers(for metadata: Ambari, Hue, Hive Metastore, Oozie, etc)?

Comment: Dedicated instance for DB is a good practice.

Memory: 128GB is a good start ( Mysql, Postgres, Oracle) ( in prod, its very important to have HA for DB)

CPU - dual 8 core or quad core if possible ( Assuming large cluster)