Support Questions

Find answers, ask questions, and share your expertise

Add disk in data node manually

avatar
Explorer

Hi ,

I am trying to add new disk in a data node and have some questions 

 

1- to add a disk in a data node do i have to add  disk in all node together .

 

2-do i have to update dfs.name.dir and dfs.data.dir both parameter ?

 

3- do i have to update dfs.name.dir/dfs.data.dir on each data node in case i add disk on one data node .

1 ACCEPTED SOLUTION

avatar
> 1- to add a disk in a data node do i have to add disk in all node together .

Each node can have a different number of disks. It's not ideal but not
wrong to be so.


> 2-do i have to update dfs.name.dir and dfs.data.dir both parameter ?

If you're using this new disk for storing hdfs blocks, then update
dfs.data.dir. If you're using it for namenode metadata, then update
dfs.name.dir.


> 3- do i have to update dfs.name.dir/dfs.data.dir on each data node in case i add disk on one data node .

dfs.data.dir would be different on each datanode if they all have
varying mount points and/or number of disks.


Regards,
Gautam Gopalakrishnan

View solution in original post

6 REPLIES 6

avatar
> 1- to add a disk in a data node do i have to add disk in all node together .

Each node can have a different number of disks. It's not ideal but not
wrong to be so.


> 2-do i have to update dfs.name.dir and dfs.data.dir both parameter ?

If you're using this new disk for storing hdfs blocks, then update
dfs.data.dir. If you're using it for namenode metadata, then update
dfs.name.dir.


> 3- do i have to update dfs.name.dir/dfs.data.dir on each data node in case i add disk on one data node .

dfs.data.dir would be different on each datanode if they all have
varying mount points and/or number of disks.


Regards,
Gautam Gopalakrishnan

avatar
New Contributor

Hi Gautam, team

 

I have follow up questions on this topic and they are;

 

1. Customer is having 1 cluster with 17 nodes and want to add more storage (no intension to increase the compute) and make it to >=48tbs. Is this recommed? If so please share some pointers.

2. Here in this http://i.dell.com/sites/doccontent/business/large-business/en/Documents/Dell-Cloudera-Apache-Hadoop-....  in this Ref doc it is mentioned as "For drive capacities greater than 4TB or node storage density over 48TB, special consideration is required for HDFS setup. Configurations of this size are close to the limit of Hadoop per-node storage capacity". Please share insights in this regard as well.

2. Are there any side effects by doing this. (e.g job running performance etc, etc.) and the considerations...

 

Regards,

Prad

avatar
Champion

@prad

 

People ususally consider the below following during hardware sizing 

 

1. number of disks spindles and its throughput 

 

2.total number of time to replicate the data loss when one of the node is corrupted 

 

we have 12x2TB which works good over 12x4TB  considering the above 

 

 

 

 

 

avatar
New Contributor

@csguna

 

Thanks.

 

1. But what are the config changes required to perform these steps

2. I think we also need to consider the current workloads that are running and more strorage in the same node may give some performance issues?

 

- Prad

avatar
Champion

2. Number of roles and its memory allocation does effect the performance like swapping , gc pause . 

so you have to be careful calculate them  in deployment phase .

 

 

1 .  "But what are the config changes required to perform these steps "  - Could you be  more specfic 

avatar
New Contributor

@csguna

 

Regarding #1... I am looking for any additional configuration steps required. Or there is no 'additional' config required.

 

Thanks.