I am trying to add new disk in a data node and have some questions
1- to add a disk in a data node do i have to add disk in all node together .
2-do i have to update dfs.name.dir and dfs.data.dir both parameter ?
3- do i have to update dfs.name.dir/dfs.data.dir on each data node in case i add disk on one data node .
Hi Gautam, team
I have follow up questions on this topic and they are;
1. Customer is having 1 cluster with 17 nodes and want to add more storage (no intension to increase the compute) and make it to >=48tbs. Is this recommed? If so please share some pointers.
2. Here in this http://i.dell.com/sites/doccontent/business/large-business/en/Documents/Dell-Cloudera-Apache-Hadoop-.... in this Ref doc it is mentioned as "For drive capacities greater than 4TB or node storage density over 48TB, special consideration is required for HDFS setup. Configurations of this size are close to the limit of Hadoop per-node storage capacity". Please share insights in this regard as well.
2. Are there any side effects by doing this. (e.g job running performance etc, etc.) and the considerations...
People ususally consider the below following during hardware sizing
1. number of disks spindles and its throughput
2.total number of time to replicate the data loss when one of the node is corrupted
we have 12x2TB which works good over 12x4TB considering the above
1. But what are the config changes required to perform these steps
2. I think we also need to consider the current workloads that are running and more strorage in the same node may give some performance issues?
2. Number of roles and its memory allocation does effect the performance like swapping , gc pause .
so you have to be careful calculate them in deployment phase .
1 . "But what are the config changes required to perform these steps " - Could you be more specfic