1) I have setup HDFS HA(Namenode - High Avalibility) which consists of one Namenode in old office and one Namenode(i.e. Standby) in New Office, This offices are connected through VPN and I am wondering that what if VPN connectivity goes offline?
What Artem said. Don't do it. The latency will be deadly for the performance of your cluster. If you want DR, create two clusters and sync them with distcp, falcon, wandisco, ...
Apart from that if VPN goes down, one of the nodes will not be able to write to zookeeper, the zookeeper failover controller will decide who is still up. ( Essentially if you distributed zookeeper as well the data center with more zookeeper nodes will win ) and that Namenode will become active.
How client will know which namenode is active ?
I mean I am sending request to hdfs service on 8020 port which initially uses some IP/DNS of that namenode machine.
Now if that namenode goes offline other one will active and its IP/DNS will be different, How would I know to request on that machine now onwards
Which client. If you use webhdfs you need to check for a StandbyException in the call answer and switch to the other namenode yourself.
Other clients have access to the hdfs-site.xml and will do the switch automatically. You need to however specify the address using the HA name
In the hdfs-site.xml you will find the mapping from the HA name you chose to the two namenodes and clients like the hadoop command will use that to find the active one.
Hi @Benjamin Leonhardi, I don't understand exactly what you mean by other clients in your answer.
I am using small java application that upload/download document on the datanodes.So for java I guess WebHDFS is the client to make the requests to HDFS.How can I configure the automatic switch between two namenodes.It would be great if you help me out here with little details to your answer.