Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 612 | 06-04-2025 11:36 PM | |
| 1181 | 03-23-2025 05:23 AM | |
| 585 | 03-17-2025 10:18 AM | |
| 2190 | 03-05-2025 01:34 PM | |
| 1376 | 03-03-2025 01:09 PM |
12-04-2017
11:11 PM
@Michael Bronson Can you validate the entry was created. [zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha/hdfsha Check lock [zk: localhost:2181(CONNECTED) 2] get /hadoop-ha/hdfsha/ActiveStandbyElectorLock Check the status $ hdfs haadmin -getServiceState namenode1 $ hdfs haadmin -getServiceState namenode2 Try to failover $ hdfs haadmin -failover Use the below to force one to Active $ hdfs haadmin -transitionToActive <serviceId> --forceactive or $ hdfs haadmin -transitionToStandby <serviceId> Hope that helps
... View more
12-04-2017
10:34 PM
@Michael Bronson Can you delete the entry in zookeeper and restart [zk: localhost:2181(CONNECTED) 1] rmr /hadoop-ha Validate that there is no hadoop-ha entry, [zk: localhost:2181(CONNECTED) 2] ls / Then restart the all components HDFS service. This will create a new ZNode with correct lock(of Failover controller). Also see https://community.hortonworks.com/questions/12942/how-to-clean-up-files-in-zookeeper-directory.html#
... View more
12-04-2017
10:22 PM
@Michael Bronson localhost:2181(CONNECTED) 2] ls /hadoop-ha
[hdfsha] Next localhost:2181(CONNECTED) 2] get /hadoop-ha/hdfsha/ActiveStandbyElectorLock What output do you get ?
... View more
12-04-2017
10:00 PM
@Michael Bronson Can you attach your namenode log How's your /etc/hosts entry? 103.114.28.13 master01.sys4.com
103.114.28.12 master03.sys4.com or IP/hostname /Alias 103.114.28.13 master01.sys4.com master01
103.114.28.12 master03.sys4.com master03 What is the output of $ zkCli.sh [zk: localhost:2181(CONNECTED) 0] ls /hadoop-ha If this cluster is not critical then you might have to have to go through these steps
... View more
12-04-2017
08:44 PM
@Lukas Müller Can you copy and paste the contents of your /etc/yum.repos.d ? The filename /etc/yum.repos.d/ambari-hdp-2.repo doesn't look correct you should see something like this # ls -al /etc/yum.repos.d/
total 56
drwxr-xr-x. 2 root root 4096 Oct 19 13:13 .
......
-rw-r--r-- 1 root root 306 Oct 19 13:04 ambari.repo
-rw-r--r--. 1 root root 575 Aug 30 21:34 hdp.repo
-rw-r--r-- 1 root root 128 Oct 19 13:13 HDP.repo
-rw-r--r-- 1 root root 151 Oct 19 13:13 HDP-UTILS.repo Please correct that and retry
... View more
12-04-2017
08:07 PM
@ Christian Nunez Can you check in the ambari database whether the hosts have been registered below is from mysql mysql> select host_id,host_name,last_registration_time, public_host_name from hosts; Please let me know, an advice always open a new thread because this is closed thread and members usually ignore.
... View more
12-04-2017
02:15 PM
@Sedat Kestepe Stop the Hdfs service if its running. Start only the journal nodes (as they will need to be made aware of the formatting) On the namenode (as user hdfs) # su - hdfs Format the namenode $ hadoop namenode -format Initialize the Edits (for the journal nodes) $ hdfs namenode -initializeSharedEdits -force Format Zookeeper (to force zookeeper to reinitialise) $ hdfs zkfc -formatZK -force Using Ambari restart the namenode If you are running HA name node then On the second namenode Sync (force synch with first namenode) $ hdfs namenode -bootstrapStandby -force On every datanode clear the data directory which is already done in your case Restart the HDFS service Hope that helps
... View more
12-04-2017
01:25 PM
@Michael Bronson If this is a production environment I would advise you to contact hortonworks support. How many nodes in your cluster? How many Journalnodes you have in cluster ? Make sure you have odd number. Could you also confirm whether at any point after enabling the HA the Active and Standby namenodes ever functioned? Your log messages indicates that there was a timeout condition when the NameNode attempted to call the JournalNodes. The NameNode must successfully call a quorum of JournalNodes: at least 2 out of 3. This means that the call timed out to at least 2 out of 3 of them. This is a fatal condition for the NameNode, so by design, it aborts. There are multiple potential reasons for this timeout condition. Reviewing logs from the NameNodes and JournalNodes would likely reveal more details. If its a none critical cluster ,you can follow the below steps Stop the Hdfs service if its running. Start only the journal nodes (as they will need to be made aware of the formatting) On the first namenode (as user hdfs) # su - hdfs Format the namenode $ hadoop namenode -format Initialize the Edits (for the journal nodes) $ hdfs namenode -initializeSharedEdits -force Format Zookeeper (to force zookeeper to reinitialise) $ hdfs zkfc -formatZK -force Using Ambari restart that first namenode On the second namenode Sync (force synch with first namenode) $ hdfs namenode -bootstrapStandby -force On every datanode clear the data directory Restart the HDFS service Hope that helps
... View more
12-04-2017
12:02 PM
@Michael Bronson From your screenshot, both namenodes are down hence the failure of the failover commands. Since you enabled NameNode HA using Ambari and the ZooKeeper service instances and ZooKeeper FailoverControllers to be up and running. Just restart the name nodes but its bizarre that none is marked (Active and Standby). Depending on the cluster use DEV or Prod please take the appropriate steps to restart the namenode because your cluster is now unusable anyway. Using Ambari use the HDFS restart all command under Service actions ,
... View more