About Shelton

Shelton · ‎08-24-2021

@lyash There are a couple of things members need to be able to help with your case. CDH or CDP version? OS? Your Postgres version and document followed for setup or steps executed? Memory allocated to the hive Can you connect to Postgres locally? ie sudo --login --user=postgres Can you change the hive.metastore.schema.verification to false in hive-site.xml Please revert

Shelton · ‎08-21-2021

@mike_bronson7 Can you share your capacity scheduler , total memory and vcores configs ?

Shelton · ‎08-21-2021

@Nitin0858 Can you share the 2 contents so we can help with the analysis?

Shelton · ‎08-06-2021

@NIFI_123 Maybe you try this crontab generator it has more possibilities Hope that helps

Shelton · ‎08-01-2021

@Vinay1991 I mentioned the logs below.You will need definitely the ZK ZKFailoverController logs and NameNode logs

Shelton · ‎07-30-2021

@Buithuy96 First and foremost re-running these steps won't do damage to your cluster I assure you. What you have is purely a permissions issue java.sql.SQLException: Access denied for user 'ambari'@'mtnode.hdp.vn' (using password: YES) Revalidate the MySQL connector # yum install -y mysql-connector-java # ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar Re-run the Ambari user setup CREATE USER 'ambari'@'%' IDENTIFIED BY 'aCtct@123'; CREATE USER 'ambari'@'localhost' IDENTIFIED BY 'Ctct@123'; GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%'; GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'localhost'; GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'mtnode.hdp.vn'; GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'mtnode.hdp.vn' IDENTIFIED BY 'Ctct@123'; FLUSH PRIVILEGES; Try restarting Ambari while tailing the ambari-server.log and share the contents, first reset the log before starting Ambari, to ensure you have a minimum log to delve through # truncate --size 0 /var/logs/ambari-server/ambari-server.log Restart your Ambari server and tail the logs # tail -f /var/logs/ambari-server/ambari-server.log Share the status

Shelton · ‎07-29-2021

@Vinay1991 The ZK's look okay please go through the list I shared about the connectivity. Please validae one by one.

Shelton · ‎07-28-2021

@Vinay1991 From the logs, I see connectivity loss and that's precisely what's causing the NN switch. Remember in my earlier posting the importance of Zk quorum! Your NN and losing Connection to the ZK so the NN that loses active connection is causing the ZK to elect a new leader and that's happening in a loop Caused by : java.net.SocketTimeoutException: 5000 millis timeout while waiting for channel to be ready for read I would start by checking FW I see you are on Ubuntu so ensure the FW is disabled across the Cluster. Identifying and Fixing Socket Timeouts The root cause of a Socket Timeout is a connectivity failure between the machines, so try the usual process Check the settings: is this the machine you really wanted to talk to? From the machine that is raising the exception, can you resolve the hostname? Is that resolved hostname the correct one? Can you ping the remote host? Is the target machine running the relevant Hadoop processes? Can you telnet to the target host and port? Can you telnet to the target host and port from any other machine? On the target machine, can you telnet to the port using localhost as the hostname? If this works but external network connections time out, it's usually a firewall issue. If it is a remote object store: is the address correct? Does it go away when you repeat the operation? Does it only happen on bulk operations? If the latter, it's probably due to throttling at the far end. Check your hostname resolution DNS or /etc/hosts should be in sync and another important thing is all your host time should be in sync. can you share the value of Core-site.xml parameter ha.zookeeper.quorum

Shelton · ‎07-27-2021

@Vinay1991 Unfortunately, you haven't described your cluster setup but my assumption is that you have 3 Zk's in your HA implementation. There are two components deployed to Hadoop HDFS for implementing Automatic Failover. These two components are- ZKFailoverController process(ZKFC) ZooKeeper quorum (3 Zk's) 1. ZKFailoverController(ZKFC) The ZKFC is the ZooKeeper client, who is also responsible for managing and monitoring the NameNode state. ZKFC is a client that runs on all nodes on the Hadoop cluster, which is running NameNode. These 2 components are responsible for: Health monitoring ZKFC is accountable for health monitoring heart beating the NameNode with health-check commands periodically. As long as the NameNode responds with a healthy status timely, it considers the NameNode as healthy. In this case, if the NameNode got crashed, froze, or entered an unhealthy state, then it marks the NameNode as unhealthy. ZooKeeper session management It is also responsible for the session management with ZooKeeper. The ZKFC maintains a session open in the ZooKeeper when the local Namenode is healthy. Also, if the Local NameNode is the active NameNode, then with the session, it also holds a special lock “znode”. This lock uses ZooKeeper support for the ”ephemeral” nodes. Thus, if the session gets expires, the lock node will be deleted automatically. ZooKeeper-based election When the local Namenode is healthy and ZKFC finds that no other NameNode acquires the lock “znode”, then it will try by itself to acquire the lock. If it gets successful in obtaining the lock, then ZKFC has won the election, and now it is responsible for running the failover to make its local NameNode active. The failover process run by the ZKFC is similar to the failover process run by the manual failover described in the NameNode High Availability article. 2. ZooKeeper quorum A ZK quorum is a highly available service for maintaining little amounts of coordination data. It notifies the clients about the changes in that data. It monitors clients for the failures. The HDFS implementation of automatic failover depends on ZooKeeper for the following things: How does it detect NN Failure each NameNode machine in the Hadoop cluster maintains a persistent session in the ZooKeeper. If any of the machines crashes, then the ZooKeeper session maintained will get expire—zooKeeper than reveal to all the other NameNodes to start the failover process. To exclusively select the active NameNode, ZooKeeper provides a simple mechanism. In the case of active NameNode failure, another standby NameNode may take the special exclusive lock in the ZooKeeper, stating that it should become the next active NameNode. After the initialization of Health Monitor is completed, internal threads are started to call the method corresponding to the HASERVICE Protocol RPC interface of NameNode periodically to detect the health status of NameNode. If the Health Monitor detects a change in the health status of NameNode, it calls back the corresponding method registered by ZKFailover Controller for processing. If ZKFailover Controller decides that a primary-standby switch is needed, it will first use Active Standby Elector to conduct an automatic primary election. Active Standby Elector interacts with Zookeeper to complete an automatic backup election. Active Standby Elector calls back the corresponding method of ZKFailover Controller to notify the current NameNode to become the main NameNode or the standby NameNode after the primary election is completed. ZKFailover Controller calls the HASERVICE Protocol RPC interface corresponding to NameNode to convert NameNode to Active or Standby state. Taking all the above into account, the first component logs to check are Zk and NN, /var/log/hadoop/hdfs /var/log/zookeeper My suspicion is you have issues with the Namenode heartbeat which makes the zookeeper fail to get the pingback in time and marks the NN as dead and elects a new leader and that keeps happening in a loop. So check those ZK logs to ensure time is set correctly and is in sync! Please revert

Shelton · ‎07-26-2021

@sipocootap2 Unfortunately, you cannot disallow snapshots in a snapshottable directory that already has snapshots! Yes, you will have to list and delete the snapshot even if it contains subdirs you only pass the root snapshot in the hdfs dfs -deleteSnapshot command. If you had an $ hdfs dfs -ls /app/tomtest/.snapshot Found 2 items drwxr-xr-x - tom developer 0 2021-07-26 23:14 /app/tomtest/.snapshot/sipo/work/john drwxr-xr-x - tom developer 0 2021-07-26 23:14 /app/tomtest/.snapshot/tap2/work//peter You would simply delete the snapshots like $ hdfs dfs -deleteSnapshot /app/tomtest/ sipo $ hdfs dfs -deleteSnapshot /app/tomtest/ tap2

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Hiver Server not starting

Re: HDP cluster + resource manager logs with warni...

Re: kerberos spnego authentication not working for...

Re: Setup a Cron Job to run on Bi weekly

Re: Namenode HA Switchover

Re: Urgent support: Ambari-server can not start ca...

Re: Namenode HA Switchover

Re: Namenode HA Switchover

Re: Namenode HA Switchover

Re: Is there a way to delete an HDFS directory whi...