Member since
06-08-2016
33
Posts
10
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4763 | 06-29-2016 09:32 PM |
07-11-2016
01:03 PM
1 Kudo
Additionally I configured HA and followed:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_Ambari_Users_Guide/content/_how_to_configure_namenode_high_availability.html and https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_hadoop-ha/content/ha-nn-config-cluster.html
Still no cigar...
... View more
07-11-2016
12:59 PM
So SSHFence never seem to have worked for me with failover activated. I enabled sshfence, made an hdfs user on ambari, generated an ssh-keygen key for passwordless session, manually tested the said ssh passwordless connection.... everythinjg to have been set yet still not working... whenever one of my namenode failed over in the backend ambari falsely reported both as active. so I went to default and used the following script as its the only way I can get my primary nn to stay active and secondary as standby. Essentially the script pings the NN's every minute and if they respond it checks for the current status and forces them into an active:standby state. Theoretically having the snn as active and nn as standby should be fine, however on ambari it never reports the status correctly and unless I force transition the nodes it doesnt report them active:standby and the hdfs://ClusterName fails to work.... If someone has a better solution I'd love to hear it.... For those wondering I'm running on Ambari 2.2.2.0 and HDP 2.4.2.0 on a CentOS 6 x64 environment. Additionally looking at the documentation it implies creating a user and making and ssh script to run fencing approach... what I dont get is what is the point of running a said script if nn complains that "failover is activated... you cannot manually failover the nodes" or something along that line. There's something I'm definitely missing. Anyhow the solution above has been working for me but it doesnt feel clean and I'd like to know how to community handles HA and what scripting approach you use....
... View more
Labels:
06-29-2016
09:32 PM
So in the end knowing my config was fine I added stack.upgrade.bypass.prechecks=true to /etc/ambari-server/conf/ambari.properties and chose to disregard the warning. The upgrade went fine and all test are green. Essentially we went fro hdp 2.1 back then to the bleeding edge and some steps had to be done manually. So somehow after a few tries we succeeded but most likely left some artifacts behind.... I'm still interested to find out where this entry is located and where I could clean it up. thankfully we're getting professional services soon and building a brand new pro-level cluster with the help of some hortonworks engineers so there wont be any weird or unknown configuration choices.
... View more
06-29-2016
07:20 PM
"Further, change the status of host_role_command with id 1 to COMPLETED" How can this be done manually?
... View more
06-29-2016
06:55 PM
Unfortunately I ran all the queries in the post and I dont see any abnormalities...
... View more
06-28-2016
08:12 PM
then if I run the ambari-server set-current.... command I get the following:
ERROR: Exiting with exit code 1.
REASON: Error during setting current version. Http status code - 500. { "status" : 500, "message" : "org.apache.ambari.server.controller.spi.SystemException: Finalization failed. More details: \nSTDOUT: Begin finalizing the upgrade of cluster Timbit to version 2.4.2.0-258\n\nSTDERR: The following 516 host component(s) have not been upgraded to version 2.4.2.0-258. Please install and upgrade the Stack Version on those hosts and try again.\nHost components:\nPIG on host dn7.HugeData.lab\nPIG on host dn3.HugeData.lab\nPIG on host dn26.HugeData.lab\nPIG on host dn9.HugeData.lab\nPIG on host dn22.HugeData.lab\nPIG on host dn27.HugeData.lab\nPIG on host dn8.HugeData.lab\nPIG on host dn6.HugeData.lab\nPIG on host dn5.HugeData.lab\nPIG on host dn19.HugeData.lab\nPIG on host snn.HugeData.lab\nPIG on host dn18.HugeData.lab\nPIG on host dn21.HugeData.lab\nPIG on host dn17.HugeData.lab\nPIG on host dn23.HugeData.lab\nPIG on host dn25.HugeData.lab\nPIG on host dn1.HugeData.lab\nPIG on host dn15.HugeData.lab\nPIG on host dn14.HugeData.lab\nPIG on host esn2.HugeData.lab\nPIG on host esn.HugeData.lab\nPIG on host dn16.HugeData.lab\nPIG on host dn10.HugeData.lab\nPIG on host dn2.HugeData.lab\nPIG on host dn4.HugeData.lab\nPIG on host dn12.HugeData.lab\nPIG on host nn.HugeData.lab\nPIG on host dn11.HugeData.lab\nPIG on host dn28.HugeData.lab\nPIG on host dn20.HugeData.lab\nSPARK_JOBHISTORYSERVER on host esn2.HugeData.lab\nSPARK_CLIENT on host dn7.HugeData.lab\nSPARK_CLIENT on host dn3.HugeData.lab\nSPARK_CLIENT on host dn26.HugeData.lab\nSPARK_CLIENT on host dn9.HugeData.lab\nSPARK_CLIENT on host dn22.HugeData.lab\nSPARK_CLIENT on host ...
... View more
06-28-2016
07:47 PM
Finalize upgrade successful for nn.HugeData.lab/x.x.x.40:8020 Finalize upgrade successful for snn.HugeData.lab/x.x.x.41:8020 I then re-run the check and it still fails ://
... View more
06-28-2016
07:26 PM
So I've upgraded Ambari to the bleeding edge 2.2.2.0 today and I was about to rollout HDP 2.4.2.0-258 and I am stumped at the pre-check script after all the HDP-2.4.2.0 packages have been sucessfully installed across the board. Upgrade to HDP-2.4.2.0 Requirements You must meet these requirements before you can proceed. A previous upgrade did not complete.
Reason: Upgrade attempt (id: 1, request id: 2,681, from version: 2.2.6.0-2800, to version: 2.4.0.0-169) did not complete task with id 17,829 since its state is FAILED instead of COMPLETED. Please ensure that you called:
ambari-server set-current --cluster-name=$CLUSTERNAME --version-display-name=$VERSION_NAME
Further, change the status of host_role_command with id 1 to COMPLETED Failed on: HugeData I ran the command as instructed: ambari-server set-current --cluster-name=HugeData --version-display-name=HDP-2.4.2.0 To no avail... I am stumped at this point in time and not sure where to look to change that manually in the backend? As far as I am concerned we had been running 2.4.0.0-169 without any issues (except for the NN failover) for about a month... According to the error above we missed something in the 2.2.x to 2.4.x upgrade...... I'm sure there's a value I can edit to set as successful but I am not sure right now. Your input would be much appreciated 🙂
... View more
Labels:
06-16-2016
09:15 PM
1 Kudo
Good to know that I'm not going crazy then.... I have a feeling this is related? https://issues.apache.org/jira/browse/AMBARI-15235 Mentioned as fixed somewhat in 2.2.2? I'm still on 2.2.1.1.
... View more
06-16-2016
05:00 PM
1 Kudo
Greetings! So it seems that my configuration is wrong OR Ambari 2.2.2.1 has a refresh issue? Basically I'm running a cluster with an high availability NN's configuration. For some reason unknown when NN fails SNN becomes the active node as expected and NN goes into standby once the service is restarted. I can confirm the failover is successful by running hdfs haadmin -getServiceState nn1 & hdfs haadmin -getServiceState nn2. Respectively from that point nn1 reports Standby and nn2 reports Active. The funky part however is that on Ambari both NameNodes are marked as Active even though the backend failed over, so Ambari should report NN Standby and SNN Active. So the DFS can be written to by simply using the typical hdfs dfs -put test.log <path>/test.log Now to force Ambari to refresh the status I run the following command: echo N | hdfs haadmin -transitionToStandby --forcemanual nn2 and then essentially nn2 is marked as Standby and nn1 becomes Active and Ambari refreshes to display NN as Active and SNN as Standby and the world is happy..... So from a SysAdmin perspective I can write data to the filesystem and I'm happy and consider that an Ambari bug, however from programmer colleague it causes havok has he can't write/read/modify the file system from Java/API/hdfs://url. Is this a known issue? Expected behaviour? And last but not least what defines the hdfs://url value ? Is there an additional parameter to add from that url to fresh? thanks! Eric
... View more
Labels:
- Labels:
-
Apache Hadoop
- « Previous
- Next »