Member since
02-13-2019
10
Posts
0
Kudos Received
0
Solutions
11-30-2021
01:28 AM
Recently we have deploy a new hdp cluster into 3 master and 5 worker nodes. But it looks like there are several component that mis-placed such as Hbase service. A Hbase master has been installed into a worker node and 3 Hbase Region Servers is located on 3 master nodes. So, we've a plan to relocate the some mis-placed component of hbase (the hbase master must be in master node and the region servers mus be in worker nodes). Our first step is to decomissioned the 3 hbase region server on masternode but it cannot be done because the hbase master failed to be started Then we investigate the hbase master log and we found 2021-11-30 13:42:24,379 ERROR [master/DCHDPD03:16000:becomeActiveMaster] master.HMaster: Failed to become active master
org.apache.hadoop.hbase.util.FileSystemVersionException: hbase.version file is missing. Is your hbase.rootdir valid? You can restore hbase.version file by running 'HBCK2 filesystem -fix'. See https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:452)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:275)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:124)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:865)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2267)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:586)
at java.lang.Thread.run(Thread.java:745)
2021-11-30 13:42:24,380 ERROR [master/DCHDPD03:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master dchdpd03.dcdms,16000,1638254541977: Unhandled exception. Starting shutdown. *****
org.apache.hadoop.hbase.util.FileSystemVersionException: hbase.version file is missing. Is your hbase.rootdir valid? You can restore hbase.version file by running 'HBCK2 filesystem -fix'. See https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:452)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:275)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:124)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:865)
at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2267)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:586)
at java.lang.Thread.run(Thread.java:745)
2021-11-30 13:42:24,380 INFO [master/DCHDPD03:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'dchdpd03.dcdms,16000,1638254541977' *****
2021-11-30 13:42:24,380 INFO [master/DCHDPD03:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/DCHDPD03:16000:becomeActiveMaster
2021-11-30 13:42:27,268 INFO [master/DCHDPD03:16000] ipc.NettyRpcServer: Stopping server on /10.0.45.16:16000
2021-11-30 13:42:27,275 INFO [master/DCHDPD03:16000] regionserver.HRegionServer: Stopping infoServer
2021-11-30 13:42:27,282 INFO [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@47ac613b{/,null,UNAVAILABLE}{file:/usr/hdp/3.1.5.0-152/hbase/hbase-webapps/master}
2021-11-30 13:42:27,287 INFO [master/DCHDPD03:16000] server.AbstractConnector: Stopped ServerConnector@727320fa{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}
2021-11-30 13:42:27,287 INFO [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@37c41ec0{/static,file:///usr/hdp/3.1.5.0-152/hbase/hbase-webapps/static/,UNAVAILABLE}
2021-11-30 13:42:27,287 INFO [master/DCHDPD03:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@77c233af{/logs,file:///var/log/hbase/,UNAVAILABLE}
2021-11-30 13:42:27,288 INFO [master/DCHDPD03:16000] regionserver.HRegionServer: aborting server dchdpd03.dcdms,16000,1638254541977
2021-11-30 13:42:27,288 INFO [master/DCHDPD03:16000] regionserver.HRegionServer: stopping server dchdpd03.dcdms,16000,1638254541977; all regions closed.
2021-11-30 13:42:27,288 INFO [master/DCHDPD03:16000] hbase.ChoreService: Chore service for: master/DCHDPD03:16000 had [] on shutdown
2021-11-30 13:42:27,289 WARN [master/DCHDPD03:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2021-11-30 13:42:27,291 INFO [master/DCHDPD03:16000] zookeeper.ZooKeeper: Session: 0x37d5ab0a503000e closed
2021-11-30 13:42:27,291 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2021-11-30 13:42:27,291 INFO [master/DCHDPD03:16000] regionserver.HRegionServer: Exiting; stopping=dchdpd03.dcdms,16000,1638254541977; zookeeper connection closed.
2021-11-30 13:42:27,291 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3109) The hbase.version file is missing and we try recover it using hbck2 but it failed and it thrown some error below 14:14:10.011 [main] WARN org.apache.hadoop.hbase.client.ConnectionImplementation - Retrieve cluster id failed
java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNode Exception: KeeperErrorCode = NoNode for /hbase-unsecure/hbaseid
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_112]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_112]
at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java: 549) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:287) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [?:1.8.0_112]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [?:1.8.0_112]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [?:1.8.0_112]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) [?:1.8.0_112]
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:220) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:115) [hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at org.apache.hbase.HBCK2.connect(HBCK2.java:839) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:932) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at org.apache.hbase.HBCK2.run(HBCK2.java:830) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) [hadoop-common-3.1.1.3.1.5.0-152.jar:?]
at org.apache.hbase.HBCK2.main(HBCK2.java:1145) [hbase-hbck2-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/hbaseid
at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[h base-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:177) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342) ~[hbase-shaded-mapreduce-2.1.6.3.1.5.0-152.jar:2.1.6.3.1.5.0-152]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_112]
14:14:14.241 [main] INFO org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception, tries=6, retries=36, started=4139 ms ago, cancelled=false, msg=java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeep er.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master, details=, see https://s.apache.org/timeout
...
...
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Tue Nov 30 14:14:10 WIB 2021, RpcRetryingCaller{globalStartTime=1638256450102, pause=100, maxAttempts=36}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper .KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
Tue Nov 30 14:14:10 WIB 2021, RpcRetryingCaller{globalStartTime=1638256450102, pause=100, maxAttempts=36}, org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper .KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
...
...
14:23:00.842 [ReadOnlyZKClient-dchdpm01.dcdms:2181,dchdpm02.dcdms:2181,dchdpm03.dcdms:2181@0x5c7933ad] INFO org.ap ache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Session: 0x37d5ab0a5030012 closed
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
14:23:00.842 [ReadOnlyZKClient-dchdpm01.dcdms:2181,dchdpm02.dcdms:2181,dchdpm03.dcdms:2181@0x5c7933ad-EventThread] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - EventThread shut down
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089)
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3081)
at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterMetrics(HBaseAdmin.java:2117)
at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:149)
at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:933)
at org.apache.hbase.HBCK2.run(HBCK2.java:830)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hbase.HBCK2.main(HBCK2.java:1145)
Caused by: org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1175)
at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1234)
at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1223)
at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
... 9 more
Caused by: java.io.IOException: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException : KeeperErrorCode = NoNode for /hbase-unsecure/master
at org.apache.hadoop.hbase.client.ConnectionImplementation.get(ConnectionImplementation.java:2012)
at org.apache.hadoop.hbase.client.ConnectionImplementation.access$500(ConnectionImplementation.java:138)
at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStubNoRetries(ConnectionImplementation.java:1136)
at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1169)
... 13 more
Caused by: org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase-unsecure/master
at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:177)
at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:342)
at java.lang.Thread.run(Thread.java:745) It seems there are no node for /hbase-unsecure/hbaseid and /hbase-unsecure/master When we search the trouble for /hbase-unsecure/hbaseid & /hbase-unsecure/master, it said the hbase master need to be activated but in real we still cannot activate the hbase master Is there any step we missed or any troubleshoot? Regards
... View more
Labels:
01-12-2021
08:46 PM
If we want to limit interaction of hdp/hadoop developers/data analyst or scientist, does it mean we don't need to install client in all workernodes? And we have ever found that for special case, sqoop and oozie client, are needed to be installed in all nodes include master-worker nodes, Is it related to how sqoop and oozie works?
... View more
01-12-2021
01:57 AM
We want to deploy HDP 3.1.5 in production environment We have 3 server for masternode and 6 server for workernode And we have plan component layout across 9 nodes above but we want to make sure where we need to place the service-client below 1. Yarn clients First we've plan to install this to 9 nodes, does it okay or just install to 3 master nodes? Because as far as we know, yarn is needed for all nodes include resource managers and node managers Or is it just needed for launch yarn apps or anything else 2. Mapreduce2 clients Same as above, we plan to install it to 9 nodes because it required for mapreduce jobs Do we need to install across 9 nodes? 3. Hive clients We've plan to install it to 3 master nodes, or we just need to install it to a master node? Is it just only needed for submit hive apps from beeline (cli)? 4. infra solr clients We just plan to install it to 9 nodes and we dont know enough to know how this client works 5. Kerberos clients Does all nodes need kerberos clients because it automatically installed across all nodes when we deploy in development environment 6. Oozie clients Same as infra solr clients point, 9 nodes (plan) 7. Pig Clients We've plan to install it to only 3 master node, is it related to run pig via cli or submit pig applications? 8. Spark2 clients We've plan to install it to a master node because we just want limit it where only one server that can submit spark apps But in development environment, it installed in all nodes, how do uninstall the spark2 client in worker nodes? 9. Sqoop clients Same point as number 9, only to a master node 10. Tez client we plan to install it to 9 nodes but we dont have any info how this client works
... View more
07-29-2020
12:08 PM
Move ats-hbase from the default queue to the yarn-system queue. yarn application -changeQueue yarn-system -appId <app-id> Here, <app-id> is the ID of the ats-hbase service. I stopped at this step and the alert notification still showing although I have restart the yarn and refresh yarn capacity scheduler, how I can find the app-id? I have try " yarn app -status ats-hbase " but it returns 20/07/30 01:42:49 INFO client.RMProxy: Connecting to ResourceManager at hdpdev02.bps.go.id/10.0.45.112:8050 20/07/30 01:42:49 INFO client.AHSProxy: Connecting to Application History server at hdpdev02.bps.go.id/10.0.45.112:10200 20/07/30 01:42:50 INFO client.RMProxy: Connecting to ResourceManager at hdpdev02.bps.go.id/10.0.45.112:8050 20/07/30 01:42:50 INFO client.AHSProxy: Connecting to Application History server at hdpdev02.bps.go.id/10.0.45.112:10200 ats-hbase Failed : HTTP error code : 500 NB: The cluster is a kerberized cluster
... View more