Created 07-22-2020 01:11 PM
Hi, I need to add new hosts to an existing cluster using Ambari but the new hosts have more disks than the old nodes that I want to add to the HDFS (old nodes have /data/disk1& /data/disk2 while new nodes have/data/disk1, /data/disk2, /data/disk3 & /data/disk4). How can I add those disks after adding the nodes? can I just update dfs.datanode.data.dir?
Created 08-06-2020 07:49 AM
I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter.
Created 07-23-2020 01:52 AM
I am not a 100% sure but i dont think you can add more disks for new machines. HDFS does a round robin writes on all disk, hence you have to either have the same no. of disks or increase the disks on the existing data nodes. Then you update dfs.datanode.data.dir accordingly.
Created 07-27-2020 02:51 AM
@SagarKanani Thank you for your reply.
Referring to the documentation, I found the following:
dfs.datanode.data.dir
Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data is stored in all named directories, typically on different devices. Directories that do not exist are ignored. Heterogeneous storage allows specifying that each directory resides on a different type of storage: DISK, SSD, ARCHIVE, or RAM_DISK.
I think that means paths /data/disk3 & /data/disk4 will be ignored on old nodes, right?
Have anyone tried this scenario before?
Created on 07-28-2020 11:39 PM - edited 07-28-2020 11:39 PM
Ahh ok...didnt check the documentation my bad. But, the question still lies if it will ignore the directory on all nodes or only old nodes. I am interested how this turns out. Maybe you can do a quick trial? I dont have a dev environment to try at the moment.
Created 08-06-2020 07:49 AM
I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter.