Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Adding New Hosts with Extra Disks

Solved Go to solution
Highlighted

Adding New Hosts with Extra Disks

New Contributor

Hi, I need to add new hosts to an existing cluster using Ambari but the new hosts have more disks than the old nodes that I want to add to the HDFS (old nodes have /data/disk1& /data/disk2 while new nodes have/data/disk1, /data/disk2, /data/disk3 & /data/disk4). How can I add those disks after adding the nodes? can I just update  dfs.datanode.data.dir?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Adding New Hosts with Extra Disks

New Contributor

I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter.

 

Reference: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_man...

View solution in original post

4 REPLIES 4
Highlighted

Re: Adding New Hosts with Extra Disks

Contributor

I am not a 100% sure but i dont think you can add more disks for new machines. HDFS does a round robin writes on all disk, hence you have to either have the same no. of disks or increase the disks on the existing data nodes. Then you update dfs.datanode.data.dir accordingly. 

Highlighted

Re: Adding New Hosts with Extra Disks

New Contributor

@SagarKanani Thank you for your reply.

 

Referring to the documentation, I found the following:

 

dfs.datanode.data.dir

Determines where on the local filesystem a DFS data node should store its blocks. If this is a comma-delimited list of directories, then data is stored in all named directories, typically on different devices. Directories that do not exist are ignored. Heterogeneous storage allows specifying that each directory resides on a different type of storage: DISK, SSD, ARCHIVE, or RAM_DISK.

(https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.3/bk_hdfs-administration/content/configuration_p...)

I think that means paths /data/disk3 & /data/disk4 will be ignored on old nodes, right?

Have anyone tried this scenario before?

 

Highlighted

Re: Adding New Hosts with Extra Disks

Contributor

Ahh ok...didnt check the documentation my bad. But, the question still lies if it will ignore the directory on all nodes or only old nodes. I am interested how this turns out. Maybe you can do a quick trial? I dont have a dev environment to try at the moment. 

Highlighted

Re: Adding New Hosts with Extra Disks

New Contributor

I finally found the correct way to do that. I used Ambari to create a new configuration group that includes the new hosts only, and then I added the extra disks paths to the dfs.datanode.data.dir parameter in the new configuration group only. That will integrate the extra disk on the new nodes only into the HDFS. Older nodes will not be impacted by the change in the parameter.

 

Reference: https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_man...

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here