Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hbase hung command

avatar

Hi,

I have a cluster running Hbase with 1 Master and 6 RS. Recently I noticed that Hbase commands are sort of queuing up and possibly hung(RUNNABLE state) for one table.
When checking the state of the table, I an see that the table is disabled. The command executed to enable the table failed when executing from the Hbase shell stating that process id xxx(previous truncate command) is already running. In the HBase UI, I can see both the truncate and enable commands(shown under Procedures) is in RUNNABLE state.

I tried the kill procedure command on the truncate command but it returns false, indicating the process cannot be killed
I have data on other tables and scan commands on these tables work fine.

What might be the issue here, how can I kill the command running at Hbase and get the table back to working?

Any Help is much appreciated.

Regards,
Thomas

On checking the Master log, I can see the following Warnings:

2020-01-08 11:03:28,168 WARN  [RegionOpenAndInitThread-buckets_2-10] ipc.Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
        at java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094)
        at org.apache.hadoop.ipc.Client.call(Client.java:1457)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:818)
        at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
        at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
        at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
        at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165)
        at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442)
        at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1438)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
        at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:898)
        at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6364)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
......
2020-01-08 11:03:28,438 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore: Roll new state log: 64132
2020-01-08 11:03:28,683 DEBUG [ProcedureExecutorThread-28] util.FSTableDescriptors: Current tableInfoPath = hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/.tabledesc/.tableinfo.0000000001
2020-01-08 11:03:28,685 DEBUG [ProcedureExecutorThread-28] util.FSTableDescriptors: TableInfo already exists.. Skipping creation
2020-01-08 11:03:28,685 INFO  [RegionOpenAndInitThread-buckets_2-1] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,685 INFO  [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,685 INFO  [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,685 INFO  [RegionOpenAndInitThread-buckets_2-4] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,686 INFO  [RegionOpenAndInitThread-buckets_2-5] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
....
2020-01-08 11:03:28,686 WARN  [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
2020-01-08 11:03:28,686 WARN  [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/fbfd55d8d1852193a22875679852b1f2
2020-01-08 11:03:28,686 INFO  [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,686 INFO  [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,686 WARN  [ProcedureExecutorThread-28] procedure.TruncateTableProcedure: Retriable error trying to truncate table=buckets_2 state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
        at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:186)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:141)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:118)
        at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure$3.createHdfsRegions(CreateTableProcedure.java:361)
        at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:380)
        at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:354)
        at org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:113)
        at org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:47)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:107)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:500)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1086)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:888)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:841)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.run(ProcedureExecutor.java:443)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180)
        ... 14 more
Caused by: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
        at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:900)
        at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6364)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173)
        at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2020-01-08 11:03:28,686 WARN  [RegionOpenAndInitThread-buckets_2-5] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/6b0206739eeaeae1894fd54d36986c6e
2020-01-08 11:03:28,686 WARN  [RegionOpenAndInitThread-buckets_2-2] ipc.Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
......

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @ebythomaspanick 

 

It appears you are hitting HBASE-20616. If you have verified that no other Procedures are in RUNNABLE State (Except for Truncate & Enable for the concerned Table), Sidelining the MasterProcWALs & Clearing the Temp Directory "/apps/hbase/data/.tmp" would ensure the TruncateTableProcedure aren't retried. Stop the Masters (Active & Standby) during the concerned Step to avoid any issues. 

 

- Smarak

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hello @ebythomaspanick 

 

It appears you are hitting HBASE-20616. If you have verified that no other Procedures are in RUNNABLE State (Except for Truncate & Enable for the concerned Table), Sidelining the MasterProcWALs & Clearing the Temp Directory "/apps/hbase/data/.tmp" would ensure the TruncateTableProcedure aren't retried. Stop the Masters (Active & Standby) during the concerned Step to avoid any issues. 

 

- Smarak