About Koffi

shehbazk · ‎03-17-2022

Hello @Koffi The balancer will do the job for you, please refer to the below Official docs before configuring it. 1- Overview of the HDFS Balancer 2- Configuring the Balancer Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.

rki_ · ‎02-18-2022

The Hbase Masters can be stuck on a Standby state if it is still recovering the lease on the MasterprocWALs. You can check the same by looking at the restart logs of the Hbase Masters. Check: hdfs dfs -ls -R /apps/hbase/data/MasterProcWALs > Check How Many Files (10000) Possible Solution: Sideline MasterProcWALs + Waiting On RecoverLease

wert_1311 · ‎02-10-2022

Hi @Koffi Maybe this could be of some use - https://community.cloudera.com/t5/Support-Questions/CDH5-2-yarn-Error-starting-yarn-nodemanagers/td-p/21700

Koffi · ‎01-20-2022

@pbhagade Could you please give us the procedure to follow? It issue started to append after changing expired cetificat installed on each hosts of the hadoop cluster. The certicats were renewed but it seems that multiple service on the cluster are acting different and are generating error message like the one below: 2022-01-19 14:09:28,000 WARN AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature 2022-01-19 14:09:28,000 WARN Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - RC4 with HMAC)

pbhagade · ‎01-20-2022

Did u see these alerts once you enabled SSL in the cluster?

Koffi · ‎01-19-2022

@Scharan It seems that there is a issue with the HBASE MASTER, because we are getting the following message when we try to check the master-status by using the URL: http://xx-xxx-xx-xx04.xxxxx.xx:61310/master-status HTTP ERROR 500 Problem accessing /master-status. Reason: Server Error Caused by: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2827) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:271) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:389) at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:380) at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:81) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:112) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1374) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) However Ambari give us the status "ACTIVE HBASE MASTER" for this node:

ggangadharan · ‎01-18-2022

Hi , Mainly hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows: On the HDFS cluster, this is set to /tmp/hive- by default and is controlled by the configuration variable hive.exec.scratchdir On the client machine, this is hardcoded to /tmp/ Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS. Reference - https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-TemporaryFolders Also for each session a session directory will be created in /tmp/<session-id>_resources. To check the sessions in use from /tmp, we will need to cross-reference the session ID mentioned in the /tmp/<session-id>_resources with the HS2 log. So directories with timestamps older than hive.server2.idle.session.timeout and hive.server2.idle.operation.timeout can be deleted with respect to session directories. Take the highest value and anything older should be able to be deleted safely. So you can use Manual Script or Job to clean the temp Location, with regular intervals or you can cron a shell script with cleaning 30 or 60 days Data.

Shelton · ‎01-16-2022

@Koffi This is typical of a rogue process hasn't reslease the Caused by: java.net.BindException: Address already in use You will need to run # kill -9 5356 The restart the NN that should resolve the issue

Koffi · ‎12-19-2021

I have executed the command "su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start namenode" i have the following warning on the command line: WARNING: Use of this script to start HDFS daemons is deprecated. WARNING: Attempting to execute replacement "hdfs --daemon start" instead. and the following error on the log: 2021-12-19 14:06:55,554 ERROR namenode.NameNode (NameNode.java:main(1715)) - Failed to start namenode. org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 274473528 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:226) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:160) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:890) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1090) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:937) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:910) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 274473527; expected file to go up to 274474058 at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151) at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:179) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:213) ... 12 more 2021-12-19 14:06:55,557 INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 274473528 2021-12-19 14:06:55,558 INFO namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at XX-XXX-XX-XXXX.XXXXX.XX/XX.X.XX.XX One thing, went to check the 3 host that have the journal nodes (nn1, nn2, host3). I i did the following command: cd /hadoop/hdfs/journal/<Cluster_name>/current ll | wc -l 9653 they all have the same amount of files.

Shelton · ‎12-17-2021

@Koffi From the Ambari UI are you seeing any HDFS alert? ZKFailover Controller or Journalnodes? If so share the logs?

Online	Offline
Last Visited	‎06-16-2022 06:24 PM

Member Since	‎05-30-2019 11:54 AM
Last Visited	‎06-16-2022 06:24 PM
Posts	86
Kudos received	1

Cloudera Community

Re: Failed to connect node to cluster because loca...

Re: One Datanode has the status DEAD

Re: Hbase Master are both on standby

Re: Yarn Nodemanager not starting

Re: Not able to upload a simple file using Ambari ...

Re: Connection failed on HIVE server

Re: Metric collectors not starting

Re: How to clean up temporary Hive

Re: NameNode High Availability Health, node on a U...

Re: Both of our HDFS nanenode are down and they wo...

Re: ERROR namenode.NameNode (NameNode.java:main(17...