Support Questions

Find answers, ask questions, and share your expertise

Install using CM of Datanodes with different number of JBOD disks.

avatar
Expert Contributor

Hi experts, in the the CDH install screens it has a Data Node configuration value:

DataNode Data Directory

dfs.data.dir, dfs.datanode.data.dir

 

It states to use comma-delimited list of directories on the local file system where the DataNode stores HDFS block data. Typical values are /data/N/dfs/dn for N = 1, 2, 3.... and each disk is a JBOD file mount. How do we specify this value if the datanodes have different number of JBOD disks say 20 disks in one and 10 disks in another Datanode. Since during install this is single global variable dfs.data.dir how does it allocate the 20 data directories in those data nodes with only 10 JBOD hard disks? Since there is no hostname defined in this variable to indicate different nunber of disks in different hosts. Also in future if new datanodes are added with different number of disks how is this specified while adding new data nodes?

I posted this question earlier but didnt get a reply so appreciate if you have some info.Thanks!

1 ACCEPTED SOLUTION

avatar
Hi,

When creating your cluster, Cloudera Manager should automatically detect the directories on each host, then use Role Configuration Groups to set distinct configurations for the 10-disk nodes and the 20-disk nodes, and divide roles appropriately between those groups.

dfs.data.dir isn't global, but is a role config, so it is usually set in the Role Config Group for a role.

You can read more about configuration management here:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny...

When you add new datanodes, I suggest creating a host template and applying that to your new nodes, allowing them to easily join the correct DataNode group as well as any other roles you may be running on that node (like a YARN NodeManager). You can read about host templates here:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_host_templates.html

Thanks,
Darren

View solution in original post

2 REPLIES 2

avatar
Hi,

When creating your cluster, Cloudera Manager should automatically detect the directories on each host, then use Role Configuration Groups to set distinct configurations for the 10-disk nodes and the 20-disk nodes, and divide roles appropriately between those groups.

dfs.data.dir isn't global, but is a role config, so it is usually set in the Role Config Group for a role.

You can read more about configuration management here:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny...

When you add new datanodes, I suggest creating a host template and applying that to your new nodes, allowing them to easily join the correct DataNode group as well as any other roles you may be running on that node (like a YARN NodeManager). You can read about host templates here:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_host_templates.html

Thanks,
Darren

avatar
Expert Contributor
Thanks a lot for the info! Will review these docs.