Created 01-27-2019 03:43 PM
2019-01-08 16:22:29,475 WARN [MemStoreFlusher.0] impl.BlockReaderFactory: I/O error constructing remote block reader. java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/<ip>:51076 remote=/<ip>:50010] 2019-01-08 16:22:29,477 WARN [MemStoreFlusher.0] hdfs.DFSClient: Failed to connect to /<ip>:50010 for file /apps/hbase/data/data/default/EDA_ATTACHMENTS/376661f95c7be7f667a876480e732976/.tmp/DATA/92e54a03a44042a1be63a7ff04158792 for block BP-869721575-<ip>-1543446665241:blk_1073772872_32065, add to deadNodes and continue. 2019-01-08 16:31:06,275 ERROR [regionserver/hadoop-2:16020-shortCompactions-1546916652740] regionserver.CompactSplit: Compaction failed region=EDA_ATTACHMENTS,,1546990167772.005c417fdc141d22d49c63fe93014aa8., storeName=DATA, priority=96, startTime=1546990170152 org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://hadoop-1.nit.disa.mil:8020/apps/hbase/data/data/default/EDA_ATTACHMENTS/005c417fdc141d22d49c63fe93014aa8/.tmp/DATA/3e9c1942fe484a26a81ba5a2578a69d5 at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:579) at org.apache.hadoop.hbase.regionserver.StoreFileReader.<init>(StoreFileReader.java:104) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:270) at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:357) at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:465) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:683) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:676) at org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1858) at org.apache.hadoop.hbase.regionserver.HStore.moveFileIntoPlace(HStore.java:1431) at org.apache.hadoop.hbase.regionserver.HStore.moveCompactedFilesIntoPlace(HStore.java:1419) at org.apache.hadoop.hbase.regionserver.HStore.doCompaction(HStore.java:1387) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2095) at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:592) at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:634) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-869721575-<ip>-1543446665241:blk_1073772893_32086 file=/apps/hbase/data/data/default/EDA_ATTACHMENTS/005c417fdc141d22d49c63fe93014aa8/.tmp/DATA/3e9c1942fe484a26a81ba5a2578a69d5 at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:870) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:853) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:832) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:564) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:754) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820) at java.io.DataInputStream.readFully(DataInputStream.java:195) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:401) at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:532)
Created 01-27-2019 03:43 PM
I'm getting errors in the regionserver logs about could not obtain block. Above is an example but there are different errors related to the same could not locate block or file. The file is in hdfs. The regionservers can't recover and eventually crash. When restarting the regionserver they will try to locate the block and can't and continue to go down.
Created 01-27-2019 03:43 PM
I see these "could not obtain block" errors in the log. Here's another one. The block is ok per hdfs dfs <path> -files -blocks. This makes me think hbase can't read the file b/c it can't read the hfile trailer. Need to figure out how to verify/validate and repair the hfile.
2019-01-14 09:58:00,575 ERROR [regionserver/hadoop-2:16020-shortCompactions-1547430414622] regionserver.CompactSplit: Compaction failed region=EDA_ATTACHMENTS,,1546990167772.005c417fdc141d22d49c63fe93014aa8., storeName=DATA, priority=96, startTime=1547484994585 org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://hadoop-1.nit.disa.mil:8020/apps/hbase/data/data/default/EDA_ATTACHMENTS/005c417fdc141d22d49c63fe93014aa8/.tmp/DATA/84c4ac0eb34048f88b2c6267eb4b0f1a at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:579) at org.apache.hadoop.hbase.regionserver.StoreFileReader.<init>(StoreFileReader.java:104) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:270) at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:357) at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:465) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:683) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:676) at org.apache.hadoop.hbase.regionserver.HStore.validateStoreFile(HStore.java:1858) at org.apache.hadoop.hbase.regionserver.HStore.moveFileIntoPlace(HStore.java:1431) at org.apache.hadoop.hbase.regionserver.HStore.moveCompactedFilesIntoPlace(HStore.java:1419) at org.apache.hadoop.hbase.regionserver.HStore.doCompaction(HStore.java:1387) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1375) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2095) at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:592) at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:634) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-869721575-207.132.83.245-1543446665241:blk_1073784662_43855 file=/apps/hbase/data/data/default/EDA_ATTACHMENTS/005c417fdc141d22d49c63fe93014aa8/.tmp/DATA/84c4ac0eb34048f88b2c6267eb4b0f1a at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:870) at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:853) ___ This shows the block is ok. hdfs/apps/hbase/data/data/default/EDA_ATTACHMENTS/005c417fdc141d22d49c63fe93014aa8/.tmp/DATA/84c4ac0eb34048f88b2c6267eb4b0f1a -files -blocks /apps/hbase/data/data/default/EDA_ATTACHMENTS/005c417fdc141d22d49c63fe93014aa8/.tmp/DATA/84c4ac0eb34048f88b2c6267eb4b0f1a 85047198 bytes, replicated: replication=2, 1 block(s): OK 0. BP-869721575-207.132.83.245-1543446665241:blk_1073784662_43855 len=85047198 Live_repl=2 ___
Created 01-27-2019 03:43 PM
This shows no corrupt hfiles.
./hbase hbck -checkCorruptHFiles Checked 117 hfile for corruption HFiles corrupted: 0 HFiles moved while checking: 0 Mob files moved while checking: 0 Summary: OK Mob summary: OK
Created 01-27-2019 03:43 PM
The problem appears to have been caused by the virus scanning software running.