Created 06-16-2016 05:00 PM
Greetings!
So it seems that my configuration is wrong OR Ambari 2.2.2.1 has a refresh issue?
Basically I'm running a cluster with an high availability NN's configuration.
For some reason unknown when NN fails SNN becomes the active node as expected and NN goes into standby once the service is restarted.
I can confirm the failover is successful by running hdfs haadmin -getServiceState nn1 & hdfs haadmin -getServiceState nn2. Respectively from that point nn1 reports Standby and nn2 reports Active.
The funky part however is that on Ambari both NameNodes are marked as Active even though the backend failed over, so Ambari should report NN Standby and SNN Active.
So the DFS can be written to by simply using the typical hdfs dfs -put test.log <path>/test.log
Now to force Ambari to refresh the status I run the following command: echo N | hdfs haadmin -transitionToStandby --forcemanual nn2 and then essentially nn2 is marked as Standby and nn1 becomes Active and Ambari refreshes to display NN as Active and SNN as Standby and the world is happy.....
So from a SysAdmin perspective I can write data to the filesystem and I'm happy and consider that an Ambari bug, however from programmer colleague it causes havok has he can't write/read/modify the file system from Java/API/hdfs://url.
Is this a known issue? Expected behaviour? And last but not least what defines the hdfs://url value ? Is there an additional parameter to add from that url to fresh?
thanks!
Eric
Created 06-16-2016 08:27 PM
This does not indicate a problem with the health of HDFS. As you noted, users are still able to read and write files. The problem is limited to the reporting of HA status displayed in Ambari.
Created 06-16-2016 08:27 PM
This does not indicate a problem with the health of HDFS. As you noted, users are still able to read and write files. The problem is limited to the reporting of HA status displayed in Ambari.
Created 06-16-2016 09:15 PM
Good to know that I'm not going crazy then....
I have a feeling this is related?
https://issues.apache.org/jira/browse/AMBARI-15235
Mentioned as fixed somewhat in 2.2.2? I'm still on 2.2.1.1.
Created 06-17-2016 05:54 AM
@Eric Periard, no, you are not going crazy. 🙂
You're correct that JIRA issue AMBARI-15235 is related. That's a change that helps on the display side. AMBARI-17603 is another patch that gets more at the root cause of the problem by optimizing the JMX query.
Created on 07-11-2016 01:30 PM - edited 08-19-2019 01:05 AM
Has there been any progress so far on that issue... I've tried so many approach that I've resorted to making this script that checks the node status every minute...