HDFS disk hot swap - HBase process hangs?

Hi everybody,


While doing a hot replacement of one of the HDFS disks, we've come up to one unusual situation. We have removed the failed disk from the HDFS configuration normally, but some HBase processes we're still trying to access it:


[root@hostname ~]# lsof | grep "/dfs/3/dn"

java      25935    hbase  285r      REG               8,33       973   38930647 /dfs/2/dn/dn/current/BP-854419853-
java      25935    hbase  286r      REG               8,33        15   38930648 /dfs/2/dn/dn/current/BP-854419853-
java      25935    hbase  293r      REG               8,33      1041   38930649 /dfs/2/dn/dn/current/BP-854419853-
java      25935    hbase  294r      REG               8,33        19   38930650 /dfs/2/dn/dn/current/BP-854419853-
java      25935    hbase  299r      REG               8,33      2509   38930671 /dfs/2/dn/dn/current/BP-854419853-
java      25935    hbase  300r      REG               8,33        27   38930672 /dfs/2/dn/dn/current/BP-854419853-
jsvc      32041     hdfs  212u      REG               8,33        11   38930670 /dfs/2/dn/dn/current/BP-854419853-
jsvc      32041     hdfs  213r      REG               8,33        83   38930669 /dfs/2/dn/dn/current/BP-854419853-


We could just simply kill them, but a quick "ps -ef" discovered that those were processes belonging to active HBase RegionServer. Without a better solution, we simply restarted it (RegionServer) and the processes dissapeared as we expected. The problem is that, because of a active process filehandles on the faulty mountpoint, the OS doesn't let to do a unmount (we use CentOS 6.x). Furthermore, a simple kill on the hanged processes can terminate an healthy HBase instance (we tried that also).


Does anybody know what could cause such a behavior (we reproduced it three times and on different servers)? It's not a big deal if you have to restart a service instance like RegionServer (or any other CDH redundant service), but the hot-swap procedure doesn't mention that this could be required, right (

Re: HDFS disk hot swap - HBase process hangs?

Any thoughts on this? I've managed to reproduce the issue several times and it looks like it's not related to HBase only.


When a disk fails HDFS process keep hanging to it after the HDFS directories refresh. Restart of HDFS datanode cleans this up, but ..

Re: HDFS disk hot swap - HBase process hangs?

@mat15, I have moved your topic to our Storage board in the hopes that the experts here can confirm my suspicion, but your issue seems related to a Technical Service Bulletin we released to our customers whereby HDFS can run into issues when a disk is swapped out on a datanode.  The public JIRA capturing the issue is HDFS-7960, it should contain the details you need.

Re: HDFS disk hot swap - HBase process hangs?

Thanks for a quick feedback. Yeah, it looks like it could be the HDFS-7960 issue ..