Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are the best practices and recommendations for adding more datanodes to the large clusters in production?

avatar
 
1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

HDFS Balancer can run in the background and there is a controllable bandwidth that it consumes. In general, on a large cluster it can run continuously, but it is a must after adding new nodes to have a healthy system. Note for large clusters a single convergence run can be a full day or more (that shouldn't scare you away though), let it run.

Also, some customers reported that had more stable experience when adding nodes in small batches of a few instead of adding a full rack at once, for example.