Member since
05-30-2019
86
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1672 | 11-21-2019 10:59 AM |
02-10-2022
12:31 PM
Hi, We are not able to start a nodemanager. We are getting the following message when we try to start the component: 2022-02-10 10:14:06,375 ERROR Error starting NodeManager org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/187365.sst at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:285) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:358) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/187365.sst at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1543) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1531) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:353) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 5 more Could you please help us solve this issue? Thank you
... View more
Labels:
02-10-2022
10:25 AM
Hi @Daming Xue , Thank you for your help. It seems that one of the disks located on the datanode host has a issue. We have currently 6 workernode(datanodes) on our cluster ans each node has 6 attached disks(1 to 6) of 3T and 1 attached disk(7) of 1T. With one datanode dead we are now at 86% Disk usage(dfs used) with a good amount of blocks under replicated: We are trying to increase the amount of space available by increasing the total space on the disk 7 dynamically from 1T to 3T. We are trying to do that before getting the datanode back up again. Do you know if it could have a impact if we proceed? Thank you
... View more
02-08-2022
12:52 PM
Hi we have noticed through Ambari that one of the datanode has the status "DEAD" It seems that the process to replicate the block from the datanode to the rest of the datanode has started. In the main time what should be the proper procedure to restart the dead datanode? Thank you
... View more
Labels:
01-20-2022
02:44 PM
@pbhagade Could you please give us the procedure to follow? It issue started to append after changing expired cetificat installed on each hosts of the hadoop cluster. The certicats were renewed but it seems that multiple service on the cluster are acting different and are generating error message like the one below: 2022-01-19 14:09:28,000 WARN AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
2022-01-19 14:09:28,000 WARN Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - RC4 with HMAC)
... View more
01-19-2022
08:07 PM
@Scharan It seems that there is a issue with the HBASE MASTER, because we are getting the following message when we try to check the master-status by using the URL: http://xx-xxx-xx-xx04.xxxxx.xx:61310/master-status HTTP ERROR 500
Problem accessing /master-status. Reason:
Server Error
Caused by:
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2827)
at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:271)
at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:389)
at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:380)
at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:81)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
at org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1374)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748) However Ambari give us the status "ACTIVE HBASE MASTER" for this node:
... View more
01-19-2022
04:25 PM
Hi, We are trying to upload a simple file on the HDFS using Ambari file view. We are getting the following error: Failed to upload XXXXX_XXX_XX_Xh.xml to /XXX/XXXXX/XXX_2021_01_20 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null
at org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:85)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:510)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:135)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:736)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Ab...
(more...) Could you please help us? Thank you
... View more
Labels:
- Labels:
-
Apache Ambari
-
HDFS
01-19-2022
08:23 AM
@Scharan all components of HBase are restarted (green) by ambari but we get the following warning message in the hbase-hbase-master-xx-xxx-x1-xx01.xxxxx.xx.log 2022-01-19 11:01:11,356 WARN [Thread-24] client.RangerAdminRESTClient: Error getting policies. secureMode=true, user=hbase/xx-xxx-x1-xx01.xxxxx.xx@XXXX.XXXXX.XX (auth:KERBEROS), response={"httpStatusCode":403,"statusCode":0}, serviceName=xxxxx_hbase We tried than to restart the Metric collector without any success.
... View more
01-19-2022
07:53 AM
We are still not able to start the Ambari metric collector in the log we are still getting the following message: 2022-01-18 10:58:02,206 ERROR RECEIVED SIGNAL 15: SIGTERM
2022-01-18 10:58:07,221 WARN Timed out waiting to close instance java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.phoenix.jdbc.PhoenixDriver$1.run(PhoenixDriver.java:101)
2022-01-18 11:00:44,728 WARN Unable to connect to HBase store using Phoenix. org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=16, exceptions: Tue Jan 18 15:58:35 EST 2022, RpcRetryingCaller{globalStartTime=1642539515912, pause=100, maxAttempts=16}, org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitializ..... but my HBASE master and Regios are all up and running. Coiuld you please help us?
... View more
01-18-2022
08:09 PM
@Scharan the Metric collector did not start. And We are getting the same errors we previously shared. Any idea how to fix it? Thank you for your help.
... View more
01-18-2022
06:56 PM
hi @Scharan , I the restarting process seems to be stuck at 9% and on the ambari-metrics-collector.log we can see the following message: INFO org.apache.hadoop.hbase.client.RpcRetryingCallerImpl: Call exception, tries=9, retries=16, started=28192 ms ago, cancelled=false, msg=Call to xx-xxx-x1-xx04.xxxxx.xx/xx.x.xx.xx:61320 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xx-xxx-x1-xx04.xxxxx.xx/xx.x.xx.xx:61320, details=row 'SYSTEM:CATALOG' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xx-xxx-x1-xx04.xxxxx.xx,61320,1642560006925, seqNum=-1
... View more