Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Inconsistencies in Hbase table (hbase hbck)

avatar
New Contributor

Hi guys,

I am using HDP cluster (HDP-2.5.0.0) with Hbase version 1.1.2 and Phoenix 4.7.0 and couple days ago we experienced major crash resulting in inconsistency in one of our tables (1.5TB) (+2 index tables as we are using phoenix). I have run hbase hbck and found following results (snippet bellow):

---- Table 'SR_ACTIVITIES': overlap groups
There are 0 overlap groups with 0 overlapping regions
ERROR: Found inconsistency in table SR_ACTIVITIES
---- Table 'hbase:meta': region split map
:	[ { meta => hbase:meta,,1.1588230740, hdfs => hdfs://pcluster/apps/hbase/data/data/hbase/meta/1588230740, deployed => hbase-hdp206.lan,16020,1512001442898;hbase:meta,,1.1588230740, replicaId => 0 }, ]
null:
---- Table 'hbase:meta': overlap groups
There are 0 overlap groups with 0 overlapping regions
2017-12-02 00:32:56,453 INFO  [main] util.HBaseFsck: Computing mapping of all store files
................................................................................................................................................................
2017-12-02 00:32:57,075 INFO  [main] util.HBaseFsck: Validating mapping using HDFS state
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES/3bb33b5f4fba26a9f56dec1aedd402c9/0/e59e308a41194bb5b8299c84b0818dca.d143c81eb14ac29cd4e92681fde8cd7a
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/436271a35b6a313ac5fa69db4795e73b/0/d39348bde5134b12b0c79b8382f79a78.18ff59ef5cd0d918247f1419625020ab
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/e97a62ccee7217b9492de2efabfcfd3c/0/ad9129b0813a4f4892c51688140f4a03.2594d95578b8946187267112cc0d4098
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES/3bb33b5f4fba26a9f56dec1aedd402c9/0/2705dd256c0e49a39152d7958c074e17.d143c81eb14ac29cd4e92681fde8cd7a
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/0eceec2ec0e6e2d732f2b7a028fee2c7/0/6de9e548483e4550adee0c8ed51c7ddc.9695332dd4db41f2f4b98a5da0966eed
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES/f5b32587753648b53b78261887361758/0/2826951b5325407e9027a1c005d5ee93.623d2dc65245bf6360c79a184e527705
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/0eceec2ec0e6e2d732f2b7a028fee2c7/0/8f85f60216254247aa2c7c66e29b4d76.9695332dd4db41f2f4b98a5da0966eed
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/b688d235f38a437c0454e1b20a9e9e5e/0/36bc175dd8914291a915ea54ab0170d0.394b35ad4f3db41e06aae8d768837098
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/436271a35b6a313ac5fa69db4795e73b/0/d9a7948e713b4053b337b8002846a26d.18ff59ef5cd0d918247f1419625020ab
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES_INDEX_1/436271a35b6a313ac5fa69db4795e73b/0/36cc3329b3f14b429d762c9d6b87dc5f.18ff59ef5cd0d918247f1419625020ab
ERROR: Found lingering reference file hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES/f5b32587753648b53b78261887361758/0/fb8679074112418d9ca9db4b7e506e37.623d2dc65245bf6360c79a184e527705
2017-12-02 00:32:57,078 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=master-hdp202.lan:2181,master-hdp201.lan:2181,master-hdp203.lan:2181
2017-12-02 00:32:57,078 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=master-hdp202.lan:2181,master-hdp201.lan:2181,master-hdp203.lan:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@1c12f3ee
2017-12-02 00:32:57,157 INFO  [main-SendThread(master-hdp202.lan:2181)] zookeeper.ClientCnxn: Opening socket connection to server master-hdp202.lan/10.14.0.102:2181. Will not attempt to authenticate using SASL (unknown error)
2017-12-02 00:32:57,158 INFO  [main-SendThread(master-hdp202.lan:2181)] zookeeper.ClientCnxn: Socket connection established to master-hdp202.lan/10.14.0.102:2181, initiating session
2017-12-02 00:32:57,159 INFO  [main-SendThread(master-hdp202.lan:2181)] zookeeper.ClientCnxn: Session establishment complete on server master-hdp202.lan/10.14.0.102:2181, sessionid = 0x25aae5ddf40428e, negotiated timeout = 90000
2017-12-02 00:32:57,190 INFO  [main] zookeeper.ZooKeeper: Session: 0x25aae5ddf40428e closed
2017-12-02 00:32:57,190 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-12-02 00:32:57,190 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=master-hdp202.lan:2181,master-hdp201.lan:2181,master-hdp203.lan:2181
2017-12-02 00:32:57,190 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=master-hdp202.lan:2181,master-hdp201.lan:2181,master-hdp203.lan:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@47c64cfe
2017-12-02 00:32:57,192 INFO  [main-SendThread(master-hdp203.lan:2181)] zookeeper.ClientCnxn: Opening socket connection to server master-hdp203.lan/10.14.0.103:2181. Will not attempt to authenticate using SASL (unknown error)
2017-12-02 00:32:57,193 INFO  [main-SendThread(master-hdp203.lan:2181)] zookeeper.ClientCnxn: Socket connection established to master-hdp203.lan/10.14.0.103:2181, initiating session
2017-12-02 00:32:57,196 INFO  [main-SendThread(master-hdp203.lan:2181)] zookeeper.ClientCnxn: Session establishment complete on server master-hdp203.lan/10.14.0.103:2181, sessionid = 0x35aae5dde234249, negotiated timeout = 90000
2017-12-02 00:32:57,206 INFO  [main] zookeeper.ZooKeeper: Session: 0x35aae5dde234249 closed
2017-12-02 00:32:57,206 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-12-02 00:32:57,516 INFO  [main] util.HBaseFsck: Finishing hbck
Summary:
Table SR_ACTIVITIES is okay.
    Number of regions: 445
    Deployed on:  hbase-hdp201.lan,16020,1512001916143 hbase-hdp202.lan,16020,1512002519632 hbase-hdp203.lan,16020,1512001180484 hbase-hdp204.lan,16020,1512001283591 hbase-hdp205.lan,16020,1512001554528 hbase-hdp206.lan,16020,1512001442898
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  hbase-hdp206.lan,16020,1512001442898
15 inconsistencies detected.
Status: INCONSISTENT
2017-12-02 00:32:57,516 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2017-12-02 00:32:57,516 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x15ba5e122b09ebb
2017-12-02 00:32:57,517 INFO  [main] zookeeper.ZooKeeper: Session: 0x15ba5e122b09ebb closed
2017-12-02 00:32:57,517 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

I have snapshot from morning before crash, but when I was trying just to copy to another hfs, I am getting FileNotFoundException:

2017-11-30 13:19:54,438 INFO  [main] mapreduce.Job: Task Id : attempt_1493215799486_47105_m_000006_2, Status : FAILED
Error: java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HFileLink locations=[hdfs://pcluster/apps/hbase/data/data/default/SR_ACTIVITIES/623d2dc65245bf6360c79a184e527705/0/fb8679074112418d9ca9db4b7e506e37, hdfs://pcluster/apps/hbase/data/.tmp/data/default/SR_ACTIVITIES/623d2dc65245bf6360c79a184e527705/0/fb8679074112418d9ca9db4b7e506e37, hdfs://pcluster/apps/hbase/data/mobdir/data/default/SR_ACTIVITIES/623d2dc65245bf6360c79a184e527705/0/fb8679074112418d9ca9db4b7e506e37, hdfs://pcluster/apps/hbase/data/archive/data/default/SR_ACTIVITIES/623d2dc65245bf6360c79a184e527705/0/fb8679074112418d9ca9db4b7e506e37]
        at org.apache.hadoop.hbase.io.FileLink.getFileStatus(FileLink.java:390)
        at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.getSourceFileStatus(ExportSnapshot.java:472)
        at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:255)
        at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:197)
        at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:123)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

However, I am able to find that file but in another region:

[hdfs@hbase-hdp201 ~]$ hdfs dfs -find /apps/hbase -name fb8679074112418d9ca9db4b7e506e37 -print
/apps/hbase/data/data/default/SR_ACTIVITIES/f5b32587753648b53b78261887361758/0/fb8679074112418d9ca9db4b7e506e37
[hdfs@hbase-hdp201 ~]$

I believe this is somehow related to region splitting, and I believe that manually moving those file or using hbck can help, but can you help me to find right and secure direction ?

Thanks in advance!

Robert

1 REPLY 1

avatar
Cloudera Employee

@Robert Jonczy To fix the issue, could you please try running hbck tool with -repair option

Labels