Created 06-03-2018 12:56 PM
Hi,
We have a development/test cluster with 4 nodes each having 12 cores vcpu, 2 TB SDD and 32GB RAM previously used for cassandra. All running on Ubuntu 14. It is configured as follows
1 NN
1 SNN
4 DN
Jobs are taking longer due to limited resources, so we want to upgrade the HDP version from 2.5 to 2.6 and OS to Ubuntu 16.04 and add more resources.
We are planning to upgrade as follows,
1. Increase RAM to 128 GB on all nodes
2. Double the storage by adding additional SAS disks
3. Add 2 more additional Data Nodes with 10 cores, 128 GB
Questions I got is
1. Is it a good practice to use all of them as DN, or we should separate the NN
2. Since it was borrowed off Cassandra, is it ok if we keep all nodes of same specs including 2 new nodes
3. There is no harm in mixed storage options ie. adding extra storage in SAS wrt existing SSD
4. With the proposed new cluster, upgrading to HDP 3.0 shouldn't be a issue, I guess
Thanks in advance
MB
Created 06-03-2018 09:22 PM
@MB My responses:
1. Yes more datanodes always help , you can have data replicated across nodes by choosing replication factor of your choice. Default is 3. Having DN and NN with sufficient resources on node is not a bad idea.Critical prod clusters could have a dedicated NN to avoid IO overheads caused by DNs creating any potential issues for NN.
2. Not sure the context behind Cassandra but having different spec nodes in a cluster should not be a problem. For better management of resources , try creating config groups of nodes and allocate / isolate better spec nodes to components which are critical for your use case or which may need more resources.
3. Research more but I dont think it should be a problem.
4. Should not be.
Created 06-03-2018 09:22 PM
@MB My responses:
1. Yes more datanodes always help , you can have data replicated across nodes by choosing replication factor of your choice. Default is 3. Having DN and NN with sufficient resources on node is not a bad idea.Critical prod clusters could have a dedicated NN to avoid IO overheads caused by DNs creating any potential issues for NN.
2. Not sure the context behind Cassandra but having different spec nodes in a cluster should not be a problem. For better management of resources , try creating config groups of nodes and allocate / isolate better spec nodes to components which are critical for your use case or which may need more resources.
3. Research more but I dont think it should be a problem.
4. Should not be.
Created 06-07-2018 09:33 AM
Thanks Gaurav, it was helpful.