Created 09-29-2015 01:51 AM
Created 09-29-2015 09:25 AM
@pardeep.kumar@hortonworks.com
Listing some them which i am aware of .
1). You could add either using Ambari Blueprints (https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-AddingHoststoanExistingCluster) or using Ambari.Blueprint is much easier to do.
2). After adding the data nodes run HDFS Balancer during quiet time.
3). Adjust the dfs.namenode.handler.count to ln(no of DNs)* 20
4). Adjust the dfs.namenode.service.handler to ln(no of DNs)* 20.
ln is log of.
Others can add /correct the recomendations.
Created 09-29-2015 09:25 AM
@pardeep.kumar@hortonworks.com
Listing some them which i am aware of .
1). You could add either using Ambari Blueprints (https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-AddingHoststoanExistingCluster) or using Ambari.Blueprint is much easier to do.
2). After adding the data nodes run HDFS Balancer during quiet time.
3). Adjust the dfs.namenode.handler.count to ln(no of DNs)* 20
4). Adjust the dfs.namenode.service.handler to ln(no of DNs)* 20.
ln is log of.
Others can add /correct the recomendations.
Created 09-29-2015 01:49 PM
HDFS Balancer can run in the background and there is a controllable bandwidth that it consumes. In general, on a large cluster it can run continuously, but it is a must after adding new nodes to have a healthy system. Note for large clusters a single convergence run can be a full day or more (that shouldn't scare you away though), let it run.
Also, some customers reported that had more stable experience when adding nodes in small batches of a few instead of adding a full rack at once, for example.