Member since
04-30-2018
12
Posts
0
Kudos Received
0
Solutions
01-09-2020
06:12 AM
One of my Hbase tables is in a limbo with a Truncate command fired, not completing and getting hung. On running hbck, I can see the following error:
2020-01-09 11:54:55,741 INFO [hbasefsck-pool1-t1] util.HBaseFsck: Region { meta => table_1,,1578390508424.a96068fc6ecaa2c66882df211b274e60., hdfs => hdfs://my-hdfs/apps/hbase/data/data/default/table_1/a96068fc6ecaa2c66882df211b274e60, deployed => , replicaId => 0 } is in META, and in a disabled tabled that is not deployed
2020-01-09 11:54:56,098 DEBUG [main-SendThread(machine1-bd.org.com:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x26f81dc781e001a, packet:: clientPath:null serverPath:null finished:false header:: 23,8 replyHeader:: 23,146028924982,0 request:: '/hbase/table-lock/table_1,F response:: v{'write-master:160000000000016}
2020-01-09 11:54:56,098 DEBUG [main-SendThread(machine1-bd.org.com:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x26f81dc781e001a, packet:: clientPath:null serverPath:null finished:false header:: 24,8 replyHeader:: 24,146028924982,0 request:: '/hbase/table-lock/table_1,F response:: v{'write-master:160000000000016}
2020-01-09 11:54:56,099 INFO [main] lock.ZKInterProcessLockBase: Lock is held by: write-master:160000000000016
2020-01-09 11:54:56,099 DEBUG [main-SendThread(machine1-bd.org.com:2181)] zookeeper.ClientCnxn: Reading reply sessionid:0x26f81dc781e001a, packet:: clientPath:null serverPath:null finished:false header:: 25,4 replyHeader:: 25,146028924982,0 request:: '/hbase/table-lock/table_1/write-master:160000000000016,F response:: #ffffffff000146d61737465723a3136303030ffffffb679ffffffd411152b5050425546a14a764656661756c741296275636b6574735f321224a1864656d756331707233322e64652e7072692e6f322e636f6d10ffffff807d18fffffff8ffffffdbffffffb5ffffff8ffffffff82d18ffffff9a12002ae7472756e63617465207461626c6530ffffffb6ffffffbcffffffcfffffff91fffffff82d,s{146028892170,146028892170,1578437434934,1578437434934,0,0,0,103444100363649031,117,0,146028892170}
ERROR: Table lock acquire attempt found:[tableName=default:table_1, lockOwner=mahchine1.org.com,16000,1578432818680, threadId=154, purpose=truncate table, isShared=false, createTime=1578437434934]
I'm trying to get the get the command to terminate/stop. Can 'hbase hbck -fix' be used for this scenario? But it does not seem to work. Any help is much appreciated.
... View more
Labels:
01-08-2020
02:27 AM
Hi, I have a cluster running Hbase with 1 Master and 6 RS. Recently I noticed that Hbase commands are sort of queuing up and possibly hung(RUNNABLE state) for one table. When checking the state of the table, I an see that the table is disabled. The command executed to enable the table failed when executing from the Hbase shell stating that process id xxx(previous truncate command) is already running. In the HBase UI, I can see both the truncate and enable commands(shown under Procedures) is in RUNNABLE state. I tried the kill procedure command on the truncate command but it returns false, indicating the process cannot be killed I have data on other tables and scan commands on these tables work fine. What might be the issue here, how can I kill the command running at Hbase and get the table back to working? Any Help is much appreciated. Regards, Thomas On checking the Master log, I can see the following Warnings: 2020-01-08 11:03:28,168 WARN [RegionOpenAndInitThread-buckets_2-10] ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1398) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:818) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283) at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:283) at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2165) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1442) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1438) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1438) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:898) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6364) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205) at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173) at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ...... 2020-01-08 11:03:28,438 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore: Roll new state log: 64132 2020-01-08 11:03:28,683 DEBUG [ProcedureExecutorThread-28] util.FSTableDescriptors: Current tableInfoPath = hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/.tabledesc/.tableinfo.0000000001 2020-01-08 11:03:28,685 DEBUG [ProcedureExecutorThread-28] util.FSTableDescriptors: TableInfo already exists.. Skipping creation 2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-1] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 2020-01-08 11:03:28,685 INFO [RegionOpenAndInitThread-buckets_2-4] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 2020-01-08 11:03:28,686 INFO [RegionOpenAndInitThread-buckets_2-5] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 .... 2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057 2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/fbfd55d8d1852193a22875679852b1f2 2020-01-08 11:03:28,686 INFO [RegionOpenAndInitThread-buckets_2-3] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 2020-01-08 11:03:28,686 INFO [RegionOpenAndInitThread-buckets_2-2] regionserver.HRegion: creating HRegion buckets_2 HTD == 'buckets_2', {NAME => 'b', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} RootDir = hdfs://my-hdfs/apps/hbase/data/.tmp Table name == buckets_2 2020-01-08 11:03:28,686 WARN [ProcedureExecutorThread-28] procedure.TruncateTableProcedure: Retriable error trying to truncate table=buckets_2 state=TRUNCATE_TABLE_CREATE_FS_LAYOUT java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057 at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:186) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:141) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:118) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure$3.createHdfsRegions(CreateTableProcedure.java:361) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:380) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.createFsLayout(CreateTableProcedure.java:354) at org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:113) at org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:47) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:107) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:500) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1086) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:888) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:841) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:77) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.run(ProcedureExecutor.java:443) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180) ... 14 more Caused by: java.io.IOException: The specified region already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/2e97482e25a07a7eb17a113535474057 at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:900) at org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6364) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205) at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173) at org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-5] regionserver.HRegionFileSystem: Trying to create a region that already exists on disk: hdfs://my-hdfs/apps/hbase/data/.tmp/data/default/buckets_2/6b0206739eeaeae1894fd54d36986c6e 2020-01-08 11:03:28,686 WARN [RegionOpenAndInitThread-buckets_2-2] ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException ......
... View more
Labels:
- Labels:
-
Apache HBase
01-07-2019
09:55 AM
I could see couple of articles suggesting that swappiness setting must be set to 0 and in some cases, a value less than 10. Is there any new reccomendation for the swappiness setting for improved performance?
... View more
Labels:
- Labels:
-
Apache Hadoop
08-18-2018
06:12 PM
If the node is brought back online with a new drive put instead of the failed one and the services started, will it cause any issues to the existing data that is on the cluster and has changed?
... View more
08-18-2018
12:07 PM
I have couple of worker datanodes with multiple drive mount points in each for hdfs. One of these mountpoints failed and took place when the cluster was offline. To avoid any problems, the cluster was brought up without starting the ambari-agent or the other services on this node with the failed mount point back online. I was wondering what is the best way to reintegrate this node back to the cluster. Will there be any issues or dataloss if only the failed mount point is replaced and the the ambari-agent and other services on the node are started up? Or is there any particular approach to follow?
... View more
Labels:
- Labels:
-
Apache Hadoop
08-02-2018
08:55 AM
Hi, Thanks for your response. The database is Oracle. The estimated latency for any communication(request or response) between the server and the database is 8ms. Is the Server Performance a factor for the nodes or does it include communication with Ambari database also?
... View more
08-01-2018
02:11 PM
We have a databse migration planned where the latency bertween the database and cluster is estimated to be around 8ms. Is there a threshold for latency with respect to Ambari database?
... View more
- Tags:
- ambari-server
- latency
Labels:
- Labels:
-
Apache Ambari
06-26-2018
09:30 PM
Hi Geoffrey, Thanks for your answer. So there's no issue if the agents are unable to report their status to the server? I was wondering if the nodes would be out of sync somehow since it's unable to communicate with the server and ingorm it's alive.
... View more
06-26-2018
07:54 PM
Hi, Probably a noob query! I wanted to know if there was any real impact on a functioning cluster if the ambari-server service is stopped. Say for example all the services like Yarn, HBase, etc are running and there are multiple jobs based of spark or flink running on the cluster and the ambari-server is stopped. Other than the cluster monitoring not a possibility, does it really affect the execution of Flink/Spark jobs or the ability to access data from HDFS or HBase?
... View more
Labels:
- Labels:
-
Apache Ambari
04-30-2018
01:27 PM
Recently I upgraded my Ambari cluster's server and agent from v2.5.0.3 to v2.5.2.0. I have also performed schema upgrade using ambari-server upgrade Following this I can see that I get the following alert for the worker nodes in my cluster hdp-select could not properly read /usr/hdp. Check this directory for unexpected contents.
ERROR: Unexpected file/directory found in /usr/hdp: hadoop I see that yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs contains path /usr/hdp/hadoop/yarn/local and /usr/hdp/hadoop/log respectively. These alerts have surfaced only after the upgrade was performed. Any idea what might be the issue? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache YARN