- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hbase Region Servers Failure
- Labels:
-
Apache HBase
Created ‎05-04-2017 04:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
error-log-hbase.txtMy Hbase Region servers are getting frequently failed. I have googled it and tried some options to fix this issue.i have changed this configuration as mentioned below in hdfs-site .xml through Ambari, but no improvement.And my disk space is also free enough to store data.For your reference i had attached my Log details.Kindly suggest
dfs.client.block.write.replace-datanode-on-failure.enable=true |
dfs.client.block.write.replace-datanode-on-failure.policy =DEFAULT |
Created ‎05-04-2017 05:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you checked your HDFS Health. You can check it using hdfs fsck.
Created ‎05-05-2017 05:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi nshelke,
Yes, I have Checked it's Healthy and no datanode failure but i found missing replicas.
Created ‎05-04-2017 02:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would appear from the logs that you only have two datanodes. You don't have any datanodes to replace, therefore this property can't actually do anything. Either stabilize your datanodes, add more datanodes, or reduce the HDFS replication.
Created ‎05-05-2017 05:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Josh,
I'm having 5 Data nodes all are healthy and there is no Datanode volume failure. whether there is any other alternative way to fix this issue.If you see the log you can find more repeated Error and Fatal message
ERROR [RS_CLOSE_REGION-aps-hadoop5:16020-0] regionserver.HRegion: Memstore size is 147136.
FATAL [regionserver/aps-hadoop5/1..1..1..:16020.logRoller] regionserver.HRegionServer: ABORTING region server aps-hadoop5,16020,1493618413009: Failed log close in log roller.
whether this is will impact any thing.Kindly suggest
Created ‎05-05-2017 03:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Something is happening in your datanodes that is causing HBase to mark them as "bad"
2017-05-03 21:22:38,729 WARN [DataStreamer for file /apps/hbase/data/WALs/aps-hadoop5,16020,1493618413009/aps-hadoop5%2C16020%2C1493618413009.default.1493846432867 block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908] hdfs.DFSClient: Error Recovery for block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908 in pipeline DatanodeInfoWithStorage[1..1..1..:50010,DS-751946a0-5a6f-4485-ad27-61f061359410,DISK], DatanodeInfoWithStorage[10.64.228.140:50010,DS-8ab76f9c-ee05-4ec0-897a-8718ab89635f,DISK], DatanodeInfoWithStorage[10.64.228.150:50010,DS-57010fb6-92c0-4c3e-8b9e-11233ceb7bfa,DISK]: bad datanode DatanodeInfoWithStorage[1..1..1..:50010,DS-751946a0-5a6f-4485-ad27-61f061359410,DISK] 2017-05-03 21:22:41,744 INFO [DataStreamer for file /apps/hbase/data/WALs/aps-hadoop5,16020,1493618413009/aps-hadoop5%2C16020%2C1493618413009.default.1493846432867 block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Got error, status message , ack with firstBadLink as 10.64.228.164:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1393) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1217) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:904) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:411) 2017-05-03 21:22:41,745 WARN [DataStreamer for file /apps/hbase/data/WALs/aps-hadoop5,16020,1493618413009/aps-hadoop5%2C16020%2C1493618413009.default.1493846432867 block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908] hdfs.DFSClient: Error Recovery for block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908 in pipeline DatanodeInfoWithStorage[10.64.228.140:50010,DS-8ab76f9c-ee05-4ec0-897a-8718ab89635f,DISK], DatanodeInfoWithStorage[10.64.228.150:50010,DS-57010fb6-92c0-4c3e-8b9e-11233ceb7bfa,DISK], DatanodeInfoWithStorage[10.64.228.164:50010,DS-9ba4f08a-d996-4490-b27d-6c8ca9a67152,DISK]: bad datanode DatanodeInfoWithStorage[10.64.228.164:50010,DS-9ba4f08a-d996-4490-b27d-6c8ca9a67152,DISK] 2017-05-03 21:22:44,779 INFO [DataStreamer for file /apps/hbase/data/WALs/aps-hadoop5,16020,1493618413009/aps-hadoop5%2C16020%2C1493618413009.default.1493846432867 block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Got error, status message , ack with firstBadLink as 10.64.228.141:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1393) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1217) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:904) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:411)
I'd look in those datanode logs and figure out why they failed to respond to HBase writing data. It seems like HBase gets down to Datanodes that it can actually talk to (out of your five). In general, your HDFS seems very unstable as it, at one point, took over 70seconds to sync data (this should be a sub-second operation)
2017-05-03 21:22:44,782 INFO [sync.0] wal.FSHLog: Slow sync cost: 72065 ms, current pipeline: [DatanodeInfoWithStorage[10.64.228.140:50010,DS-8ab76f9c-ee05-4ec0-897a-8718ab89635f,DISK], DatanodeInfoWithStorage[10.64.228.150:50010,DS-57010fb6-92c0-4c3e-8b9e-11233ceb7bfa,DISK]]
Created ‎05-07-2017 05:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Something does not looks right about this one DN identified with an unusual IP address. 1..1..1..:50010
2017-05-03 21:22:38,729 WARN [DataStreamer for file /apps/hbase/data/WALs/aps-hadoop5,16020,1493618413009/aps-hadoop5%2C16020%2C1493618413009.default.1493846432867 block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908] hdfs.DFSClient: Error Recovery for block BP-1810172115-10.64.228.157-1478343078462:blk_1079562185_5838908 in pipeline DatanodeInfoWithStorage[1..1..1..:50010,DS-751946a0-5a6f-4485-ad27-61f061359410,DISK], DatanodeInfoWithStorage[10.64.228.140:50010,DS-8ab76f9c-ee05-4ec0-897a-8718ab89635f,DISK], DatanodeInfoWithStorage[10.64.228.150:50010,DS-57010fb6-92c0-4c3e-8b9e-11233ceb7bfa,DISK]: bad datanode DatanodeInfoWithStorage[1..1..1..:50010,DS-751946a0-5a6f-4485-ad27-61f061359410,DISK]
