Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Adding New Hosts with Extra Disks

avatar
New Contributor

Hi, I need to add new hosts to an existing cluster using Ambari but the new hosts have more disks than the old nodes that I want to add to the HDFS (old nodes have /data/disk1& /data/disk2 while new nodes have/data/disk1, /data/disk2, /data/disk3 & /data/disk4). How can I add those disks after adding the nodes? can I just update  dfs.datanode.data.dir?

1 ACCEPTED SOLUTION

avatar
New Contributor

I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter.

 

Reference: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_man...

View solution in original post

4 REPLIES 4

avatar
Contributor

I am not a 100% sure but i dont think you can add more disks for new machines. HDFS does a round robin writes on all disk, hence you have to either have the same no. of disks or increase the disks on the existing data nodes. Then you update dfs.datanode.data.dir accordingly. 

avatar
New Contributor

@SagarKanani Thank you for your reply.

 

Referring to the documentation, I found the following:

 

dfs.datanode.data.dir

Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data is stored in all named directories, typically on different devices. Directories that do not exist are ignored. Heterogeneous storage allows specifying that each directory resides on a different type of storage: DISK, SSD, ARCHIVE, or RAM_DISK.

(https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.3/bk_hdfs-administration/content/configuration_p...)

I think that means paths /data/disk3 & /data/disk4 will be ignored on old nodes, right?

Have anyone tried this scenario before?

 

avatar
Contributor

Ahh ok...didnt check the documentation my bad. But, the question still lies if it will ignore the directory on all nodes or only old nodes. I am interested how this turns out. Maybe you can do a quick trial? I dont have a dev environment to try at the moment. 

avatar
New Contributor

I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter.

 

Reference: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_man...