Created on 01-08-2020 02:27 AM - last edited on 01-08-2020 02:42 AM by VidyaSargur
Hi,
I have a cluster running Hbase with 1 Master and 6 RS. Recently I noticed that Hbase commands are sort of queuing up and possibly hung(RUNNABLE state) for one table.
When checking the state of the table, I an see that the table is disabled. The command executed to enable the table failed when executing from the Hbase shell stating that process id xxx(previous truncate command) is already running. In the HBase UI, I can see both the truncate and enable commands(shown under Procedures) is in RUNNABLE state.
I tried the kill procedure command on the truncate command but it returns false, indicating the process cannot be killed
I have data on other tables and scan commands on these tables work fine.
What might be the issue here, how can I kill the command running at Hbase and get the table back to working?
Any Help is much appreciated.
Regards,
Thomas
On checking the Master log, I can see the following Warnings:
2020-01-08 11:03:28,168 WARN [RegionOpenAndInitThread-buckets_2-10] ipc.Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:818)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283)
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165)
at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442)
at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1438)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:898)
at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6364)
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205)
at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173)
at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
......
2020-01-08 11:03:28,438 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore: Roll new state log: 64132
2020-01-08 11:03:28,683 DEBUG [ProcedureExecutorThread-28] util.FSTableDescriptors: Current tableInfoPath = hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/.tabledesc/.tableinfo.0000000001
2020-01-08 11:03:28,685 DEBUG [ProcedureExecutorThread-28] util.FSTableDescriptors: TableInfo already exists.. Skipping creation
2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-1] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-4] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,686 INFO [RegionOpenAndInitThread-buckets_2-5] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
....
2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/fbfd55d8d1852193a22875679852b1f2
2020-01-08 11:03:28,686 INFO [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,686 INFO [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2
2020-01-08 11:03:28,686 WARN [ProcedureExecutorThread-28] procedure.TruncateTableProcedure: Retriable error trying to truncate table=buckets_2 state=TRUNCATE_TABLE_CREATE_FS_LAYOUT
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:186)
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:141)
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:118)
at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure$3.createHdfsRegions(CreateTableProcedure.java:361)
at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:380)
at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:354)
at org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:113)
at org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:47)
at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:107)
at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:500)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1086)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:888)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:841)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:77)
at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.run(ProcedureExecutor.java:443)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180)
... 14 more
Caused by: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057
at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:900)
at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6364)
at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205)
at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173)
at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-5] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/6b0206739eeaeae1894fd54d36986c6e
2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-2] ipc.Client: interrupted waiting to send rpc request to server
java.lang.InterruptedException
......
Created on 11-12-2020 10:43 AM - edited 11-12-2020 10:44 AM
Hello @ebythomaspanick
It appears you are hitting HBASE-20616. If you have verified that no other Procedures are in RUNNABLE State (Except for Truncate & Enable for the concerned Table), Sidelining the MasterProcWALs & Clearing the Temp Directory "/apps/hbase/data/.tmp" would ensure the TruncateTableProcedure aren't retried. Stop the Masters (Active & Standby) during the concerned Step to avoid any issues.
- Smarak
Created on 11-12-2020 10:43 AM - edited 11-12-2020 10:44 AM
Hello @ebythomaspanick
It appears you are hitting HBASE-20616. If you have verified that no other Procedures are in RUNNABLE State (Except for Truncate & Enable for the concerned Table), Sidelining the MasterProcWALs & Clearing the Temp Directory "/apps/hbase/data/.tmp" would ensure the TruncateTableProcedure aren't retried. Stop the Masters (Active & Standby) during the concerned Step to avoid any issues.
- Smarak