Reply
Explorer
Posts: 19
Registered: ‎09-18-2015

HDFS disk hot swap - HBase process hangs?

Hi everybody,

 

While doing a hot replacement of one of the HDFS disks, we've come up to one unusual situation. We have removed the failed disk from the HDFS configuration normally, but some HBase processes we're still trying to access it:

 

[root@hostname ~]# lsof | grep "/dfs/3/dn"

java      25935    hbase  285r      REG               8,33       973   38930647 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/finalized/subdir
1/subdir47/blk_1073819408
java      25935    hbase  286r      REG               8,33        15   38930648 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/finalized/subdir
1/subdir47/blk_1073819408_78731.meta
java      25935    hbase  293r      REG               8,33      1041   38930649 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/finalized/subdir
1/subdir47/blk_1073819411
java      25935    hbase  294r      REG               8,33        19   38930650 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/finalized/subdir
1/subdir47/blk_1073819411_78734.meta
java      25935    hbase  299r      REG               8,33      2509   38930671 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/finalized/subdir
1/subdir52/blk_1073820774
java      25935    hbase  300r      REG               8,33        27   38930672 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/finalized/subdir
1/subdir52/blk_1073820774_80097.meta
jsvc      32041     hdfs  212u      REG               8,33        11   38930670 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/rbw/blk_10738239
82_83317.meta
jsvc      32041     hdfs  213r      REG               8,33        83   38930669 /dfs/2/dn/dn/current/BP-854419853-192.168.100.101-1450451673611/current/rbw/blk_10738239
82

 

We could just simply kill them, but a quick "ps -ef" discovered that those were processes belonging to active HBase RegionServer. Without a better solution, we simply restarted it (RegionServer) and the processes dissapeared as we expected. The problem is that, because of a active process filehandles on the faulty mountpoint, the OS doesn't let to do a unmount (we use CentOS 6.x). Furthermore, a simple kill on the hanged processes can terminate an healthy HBase instance (we tried that also).

 

Does anybody know what could cause such a behavior (we reproduced it three times and on different servers)? It's not a big deal if you have to restart a service instance like RegionServer (or any other CDH redundant service), but the hot-swap procedure doesn't mention that this could be required, right (http://www.cloudera.com/documentation/enterprise/latest/topics/admin_dn_swap.html)?

Explorer
Posts: 19
Registered: ‎09-18-2015

Re: HDFS disk hot swap - HBase process hangs?

Any thoughts on this? I've managed to reproduce the issue several times and it looks like it's not related to HBase only.

 

When a disk fails HDFS process keep hanging to it after the HDFS directories refresh. Restart of HDFS datanode cleans this up, but ..

Posts: 416
Topics: 51
Kudos: 88
Solutions: 49
Registered: ‎06-26-2013

Re: HDFS disk hot swap - HBase process hangs?

@mat15, I have moved your topic to our Storage board in the hopes that the experts here can confirm my suspicion, but your issue seems related to a Technical Service Bulletin we released to our customers whereby HDFS can run into issues when a disk is swapped out on a datanode.  The public JIRA capturing the issue is HDFS-7960, it should contain the details you need.

Highlighted
Explorer
Posts: 19
Registered: ‎09-18-2015

Re: HDFS disk hot swap - HBase process hangs?

Thanks for a quick feedback. Yeah, it looks like it could be the HDFS-7960 issue ..

Announcements