Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Newly added DataNodes won't joining the party

avatar
New Contributor

Hello,

We're running a cluster with 12 DataNode servers, each has 12 physical disks mounted as follows:

/dev/sda5 on /grid/0 type ext4 (rw,noatime)
/dev/sdb1 on /grid/1 type ext4 (rw,noatime)
/dev/sdc1 on /grid/2 type ext4 (rw,noatime)
/dev/sdd1 on /grid/3 type ext4 (rw,noatime)
/dev/sde1 on /grid/4 type ext4 (rw,noatime)
/dev/sdf1 on /grid/5 type ext4 (rw,noatime)
/dev/sdg1 on /grid/6 type ext4 (rw,noatime)
/dev/sdh1 on /grid/7 type ext4 (rw,noatime)
/dev/sdi1 on /grid/8 type ext4 (rw,noatime)
/dev/sdj1 on /grid/9 type ext4 (rw,noatime)
/dev/sdk1 on /grid/10 type ext4 (rw,noatime)
/dev/sdl1 on /grid/11 type ext4 (rw,noatime)

We've tried adding 5 newly deployed DataNodes which are stronger and greater by all means (capacity, cpu & ram) to the cluster with the following disk layout:

/dev/sda1 on /grid/0 type ext4 (rw,noatime)
/dev/sdb1 on /grid/1 type ext4 (rw,noatime)
/dev/sdc1 on /grid/2 type ext4 (rw,noatime)
/dev/sdd1 on /grid/3 type ext4 (rw,noatime)
/dev/sde1 on /grid/4 type ext4 (rw,noatime)
/dev/sdf1 on /grid/5 type ext4 (rw,noatime)
/dev/sdg1 on /grid/6 type ext4 (rw,noatime)
/dev/sdh1 on /grid/7 type ext4 (rw,noatime)
/dev/sdi1 on /grid/8 type ext4 (rw,noatime)
/dev/sdj1 on /grid/9 type ext4 (rw,noatime)
/dev/sdk1 on /grid/10 type ext4 (rw,noatime)
/dev/sdl1 on /grid/11 type ext4 (rw,noatime)
/dev/sdm1 on /grid/12 type ext4 (rw,noatime)
/dev/sdn1 on /grid/13 type ext4 (rw,noatime)
/dev/sdo1 on /grid/14 type ext4 (rw,noatime)
/dev/sdp1 on /grid/15 type ext4 (rw,noatime)

New DataNodes have 4 disks extra so we added /grid/12/hadoop/hdfs/data, /grid/13/hadoop/hdfs/data, /grid/14/hadoop/hdfs/data, /grid/15/hadoop/hdfs/data to DataNode directories in HDFS config.

Everywhere we searched, was written that in case directories does not exist, they will be ignored (because previous DataNodes lacks /grid/12,13,14,15 mount points).

What actually happened is, on previous DataNodes under /grid folders (12/,13/,14/,15/) were created and are filling up with HDFS data. Since they are mounted on / (and not on a dedicated block device), space is about to run out which is probably, not a good thing.

How to proceed now? How to remove the data which landed there to free root (/) partition space?

Thanks,

1 ACCEPTED SOLUTION

avatar

@auto gun Look into Host Groups to manage hosts with different configurations.

https://developer.ibm.com/hadoop/blog/2015/11/10/override-component-configurations-with-ambari-confi...

You can create one host group with 12 disks and 2nd host group with 16 disks. Once your groups are correctly added, any missing data on initial nodes in 12/,13/,14/,15 will be redistributed to new nodes by HDFS. At that point you can free your space.

View solution in original post

3 REPLIES 3

avatar

@auto gun Look into Host Groups to manage hosts with different configurations.

https://developer.ibm.com/hadoop/blog/2015/11/10/override-component-configurations-with-ambari-confi...

You can create one host group with 12 disks and 2nd host group with 16 disks. Once your groups are correctly added, any missing data on initial nodes in 12/,13/,14/,15 will be redistributed to new nodes by HDFS. At that point you can free your space.

avatar
New Contributor

I've followed your explanation and found it to work on our setup.

Thanks @Shishir Saxena!

avatar
New Contributor

@Shishir Saxena Thanks for your input.

Right now I don't understand how to remove /grid/1{2..5} from first 12 datanodes.

Can I just 'rm -rf' these folders?