About Koffi

Koffi · ‎02-10-2022

Hi, We are not able to start a nodemanager. We are getting the following message when we try to start the component: 2022-02-10 10:14:06,375 ERROR Error starting NodeManager org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/187365.sst at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:285) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:358) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/187365.sst at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1543) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1531) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:353) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 5 more Could you please help us solve this issue? Thank you

Koffi · ‎02-10-2022

Hi @Daming Xue , Thank you for your help. It seems that one of the disks located on the datanode host has a issue. We have currently 6 workernode(datanodes) on our cluster ans each node has 6 attached disks(1 to 6) of 3T and 1 attached disk(7) of 1T. With one datanode dead we are now at 86% Disk usage(dfs used) with a good amount of blocks under replicated: We are trying to increase the amount of space available by increasing the total space on the disk 7 dynamically from 1T to 3T. We are trying to do that before getting the datanode back up again. Do you know if it could have a impact if we proceed? Thank you

Koffi · ‎02-08-2022

Hi we have noticed through Ambari that one of the datanode has the status "DEAD" It seems that the process to replicate the block from the datanode to the rest of the datanode has started. In the main time what should be the proper procedure to restart the dead datanode? Thank you

Koffi · ‎01-20-2022

@pbhagade Could you please give us the procedure to follow? It issue started to append after changing expired cetificat installed on each hosts of the hadoop cluster. The certicats were renewed but it seems that multiple service on the cluster are acting different and are generating error message like the one below: 2022-01-19 14:09:28,000 WARN AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature 2022-01-19 14:09:28,000 WARN Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - RC4 with HMAC)

Koffi · ‎01-19-2022

@Scharan It seems that there is a issue with the HBASE MASTER, because we are getting the following message when we try to check the master-status by using the URL: http://xx-xxx-xx-xx04.xxxxx.xx:61310/master-status HTTP ERROR 500 Problem accessing /master-status. Reason: Server Error Caused by: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2827) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:271) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:389) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:380) at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:81) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1374) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) However Ambari give us the status "ACTIVE HBASE MASTER" for this node:

Koffi · ‎01-19-2022

Hi, We are trying to upload a simple file on the HDFS using Ambari file view. We are getting the following error: Failed to upload XXXXX_XXX_XX_Xh.xml to /XXX/XXXXX/XXX_2021_01_20 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:85) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:510) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:135) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:736) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$Ab... (more...) Could you please help us? Thank you

Koffi · ‎01-19-2022

@Scharan all components of HBase are restarted (green) by ambari but we get the following warning message in the hbase-hbase-master-xx-xxx-x1-xx01.xxxxx.xx.log 2022-01-19 11:01:11,356 WARN [Thread-24] client.RangerAdminRESTClient: Error getting policies. secureMode=true, user=hbase/xx-xxx-x1-xx01.xxxxx.xx@XXXX.XXXXX.XX (auth:KERBEROS), response={"httpStatusCode":403,"statusCode":0}, serviceName=xxxxx_hbase We tried than to restart the Metric collector without any success.

Koffi · ‎01-19-2022

We are still not able to start the Ambari metric collector in the log we are still getting the following message: 2022-01-18 10:58:02,206 ERROR RECEIVED SIGNAL 15: SIGTERM 2022-01-18 10:58:07,221 WARN Timed out waiting to close instance java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.phoenix.jdbc.PhoenixDriver$1.run(PhoenixDriver.java:101) 2022-01-18 11:00:44,728 WARN Unable to connect to HBase store using Phoenix. org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=16, exceptions: Tue Jan 18 15:58:35 EST 2022, RpcRetryingCaller{globalStartTime=1642539515912, pause=100, maxAttempts=16}, org.apache.hadoop.hbase.PleaseHoldException: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitializ..... but my HBASE master and Regios are all up and running. Coiuld you please help us?

Koffi · ‎01-18-2022

@Scharan the Metric collector did not start. And We are getting the same errors we previously shared. Any idea how to fix it? Thank you for your help.

Koffi · ‎01-18-2022

hi @Scharan , I the restarting process seems to be stuck at 9% and on the ambari-metrics-collector.log we can see the following message: INFO org.apache.hadoop.hbase.client.RpcRetryingCallerImpl: Call exception, tries=9, retries=16, started=28192 ms ago, cancelled=false, msg=Call to xx-xxx-x1-xx04.xxxxx.xx/xx.x.xx.xx:61320 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: xx-xxx-x1-xx04.xxxxx.xx/xx.x.xx.xx:61320, details=row 'SYSTEM:CATALOG' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xx-xxx-x1-xx04.xxxxx.xx,61320,1642560006925, seqNum=-1

Online	Offline
Last Visited	‎06-16-2022 06:24 PM

Member Since	‎05-30-2019 11:54 AM
Last Visited	‎06-16-2022 06:24 PM
Posts	86
Kudos received	1

Cloudera Community

Re: Failed to connect node to cluster because loca...

Yarn Nodemanager not starting

Re: One Datanode has the status DEAD

One Datanode has the status DEAD

Re: Not able to upload a simple file using Ambari ...

Re: Metric collectors not starting

Not able to upload a simple file using Ambari file...

Re: Metric collectors not starting

Re: Metric collectors not starting

Re: Metric collectors not starting

Re: Metric collectors not starting