Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DamagedWALException cause all hbase regionserver restart

Highlighted

DamagedWALException cause all hbase regionserver restart

New Contributor

hello everyone:

My hbase cluster use microsoft azure storage,but Lately my hbase cluster often restart because Following error,what's the reason ? Who can help analyze? Thanks!

2017-03-05 03:03:48,359 ERROR [regionserver/***/***:16020.logRoller] wal.FSHLog: Failed close of WAL writer wasb://***/hbase/WALs/****,16020,1488594155314/***%2C16020%2C1488594155314.default.1488680586523, unflushedEntries=240 org.apache.hadoop.hbase.regionserver.wal.FailedSyncBeforeLogCloseException: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.waitSafePoint(FSHLog.java:1914) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:973) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:740) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:148) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:2094) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1971) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Failed appending 338191, requesting roll of WAL at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:2208) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:2049) ... 5 more Caused by: java.io.IOException: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.writePayloadToServer(PageBlobOutputStream.java:362) at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.runInternal(PageBlobOutputStream.java:318) at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.run(PageBlobOutputStream.java:252) ... 3 more Caused by: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:219) at com.microsoft.azure.storage.blob.CloudPageBlob.putPagesInternal(CloudPageBlob.java:634) at com.microsoft.azure.storage.blob.CloudPageBlob.uploadPages(CloudPageBlob.java:971) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudPageBlobWrapperImpl.uploadPages(StorageInterfaceImpl.java:485) at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.writePayloadToServer(PageBlobOutputStream.java:353) ... 5 more Caused by: java.io.IOException: Error writing to server at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:666) at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:126) ... 9 more 2017-03-05 03:03:48,362 FATAL [regionserver/T-BJ-HDP-08/192.168.0.52:16020.logRoller] regionserver.HRegionServer: ABORTING region server t-bj-hdp-08,16020,1488594155314: Failed log close in log roller org.apache.hadoop.hbase.regionserver.wal.FailedLogCloseException: wasb://***/hbase/WALs/****,16020,1488594155314****%2C16020%2C1488594155314.default.1488680586523, unflushedEntries=240 at org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:1025) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:740) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:148) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.regionserver.wal.FailedSyncBeforeLogCloseException: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.waitSafePoint(FSHLog.java:1914) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:973) ... 3 more Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: On sync at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:2094) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1971) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Failed appending 338191, requesting roll of WAL at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:2208) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:2049) ... 5 more Caused by: java.io.IOException: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.writePayloadToServer(PageBlobOutputStream.java:362) at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.runInternal(PageBlobOutputStream.java:318) at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.run(PageBlobOutputStream.java:252) ... 3 more Caused by: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:219) at com.microsoft.azure.storage.blob.CloudPageBlob.putPagesInternal(CloudPageBlob.java:634) at com.microsoft.azure.storage.blob.CloudPageBlob.uploadPages(CloudPageBlob.java:971) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudPageBlobWrapperImpl.uploadPages(StorageInterfaceImpl.java:485) at org.apache.hadoop.fs.azure.PageBlobOutputStream$WriteRequest.writePayloadToServer(PageBlobOutputStream.java:353) ... 5 more Caused by: java.io.IOException: Error writing to server at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:666) at sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:678) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:126) ... 9 more

i use hdp 2.5

13237-hbase.png

my hbase parameter is

13238-hbaseparamter1.png

13239-hbaseparameter3.png

13240-hbaesparamter5.png

13251-hbasepa6.png

and one reginsever always appear follow error why ? Is there any relationship between the two questions?

2017-03-05 04:16:43,407 ERROR [MemStoreFlusher.17] regionserver.MemStoreFlusher: Cache flusher failed for entry [flush region SysActionLog,20170224150141387#BASE#HSF#Teld.Base.SPI.IAccountService-GetUsableThirdAccs##f8d3026e-9918-43ce-9602-3b7f35026b88#192.168.0.10,1487925468836.f05b136d44caad668065660c94dd3c7b.] java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:540) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139) at org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:490) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:359) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:514) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1548) at org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:745) at org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:130) at org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:2074) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7833) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:504) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:745) 2017-03-05 04:17:45,422 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for SysActionLog,20170224150141387#BASE#HSF#Teld.Base.SPI.IAccountService-GetUsableThirdAccs##f8d3026e-9918-43ce-9602-3b7f