Support Questions

Find answers, ask questions, and share your expertise

Impact of growing a Datanode Volume

avatar
Rising Star

I think I am asking a slightly different question than is here

https://community.hortonworks.com/questions/6796/how-to-increase-datanode-filesystem-size.html

but a solution should help both.

SAN issues aside!

Is there a method to expand the volume under a datanode directory and have HDFS recognize the new allocated space? For instance if we were to mount a virtual file system, say netapp, in Centos and then expand that filesystem: How would one make the change known to HDFS?

1 ACCEPTED SOLUTION

avatar
Super Guru

@wsalazar

I agree with Neeraj.

Yes, You can expand the volume under datanode directory and make it easily available in HDFS.

Two basic things you always needs to take care after increasing/extending existing volume is -

1. OS side : Make sure the new volume is reflecting with newer/extended size [ ie. in linux you can use - partprobe/kpart for lvm =resize2fs, for multipath volume =kpartx ]. Once new size is reflected on OS the HDFS automatically picks up the new size for datanodes without restart required.

2. HDFS side: For evenly distributing data across all datanodes you need to run "Rebalancer" from Cluster UI or command line.

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@wsalazar

You can increase the size and on the safe side run rebalance https://wiki.apache.org/hadoop/FAQ

avatar
Super Guru

@wsalazar

I agree with Neeraj.

Yes, You can expand the volume under datanode directory and make it easily available in HDFS.

Two basic things you always needs to take care after increasing/extending existing volume is -

1. OS side : Make sure the new volume is reflecting with newer/extended size [ ie. in linux you can use - partprobe/kpart for lvm =resize2fs, for multipath volume =kpartx ]. Once new size is reflected on OS the HDFS automatically picks up the new size for datanodes without restart required.

2. HDFS side: For evenly distributing data across all datanodes you need to run "Rebalancer" from Cluster UI or command line.