Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Upgrading the cluster - OS, HDP, resources

avatar
Explorer

Hi,

We have a development/test cluster with 4 nodes each having 12 cores vcpu, 2 TB SDD and 32GB RAM previously used for cassandra. All running on Ubuntu 14. It is configured as follows

1 NN

1 SNN

4 DN

Jobs are taking longer due to limited resources, so we want to upgrade the HDP version from 2.5 to 2.6 and OS to Ubuntu 16.04 and add more resources.

We are planning to upgrade as follows,

1. Increase RAM to 128 GB on all nodes

2. Double the storage by adding additional SAS disks

3. Add 2 more additional Data Nodes with 10 cores, 128 GB

Questions I got is

1. Is it a good practice to use all of them as DN, or we should separate the NN

2. Since it was borrowed off Cassandra, is it ok if we keep all nodes of same specs including 2 new nodes

3. There is no harm in mixed storage options ie. adding extra storage in SAS wrt existing SSD

4. With the proposed new cluster, upgrading to HDP 3.0 shouldn't be a issue, I guess

Thanks in advance

MB

1 ACCEPTED SOLUTION

avatar

@MB My responses:

1. Yes more datanodes always help , you can have data replicated across nodes by choosing replication factor of your choice. Default is 3. Having DN and NN with sufficient resources on node is not a bad idea.Critical prod clusters could have a dedicated NN to avoid IO overheads caused by DNs creating any potential issues for NN.

2. Not sure the context behind Cassandra but having different spec nodes in a cluster should not be a problem. For better management of resources , try creating config groups of nodes and allocate / isolate better spec nodes to components which are critical for your use case or which may need more resources.

3. Research more but I dont think it should be a problem.

4. Should not be.

View solution in original post

2 REPLIES 2

avatar

@MB My responses:

1. Yes more datanodes always help , you can have data replicated across nodes by choosing replication factor of your choice. Default is 3. Having DN and NN with sufficient resources on node is not a bad idea.Critical prod clusters could have a dedicated NN to avoid IO overheads caused by DNs creating any potential issues for NN.

2. Not sure the context behind Cassandra but having different spec nodes in a cluster should not be a problem. For better management of resources , try creating config groups of nodes and allocate / isolate better spec nodes to components which are critical for your use case or which may need more resources.

3. Research more but I dont think it should be a problem.

4. Should not be.

avatar
Explorer

Thanks Gaurav, it was helpful.