Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What should be the behaviour of HDFS HA in VPN environment

What should be the behaviour of HDFS HA in VPN environment

New Contributor

1) I have setup HDFS HA(Namenode - High Avalibility) which consists of one Namenode in old office and one Namenode(i.e. Standby) in New Office, This offices are connected through VPN and I am wondering that what if VPN connectivity goes offline?

6 REPLIES 6

Re: What should be the behaviour of HDFS HA in VPN environment

Mentor

This is not a recommended setup. Hadoop clusters cannot span geographical areas with good results.

Re: What should be the behaviour of HDFS HA in VPN environment

What Artem said. Don't do it. The latency will be deadly for the performance of your cluster. If you want DR, create two clusters and sync them with distcp, falcon, wandisco, ...

Apart from that if VPN goes down, one of the nodes will not be able to write to zookeeper, the zookeeper failover controller will decide who is still up. ( Essentially if you distributed zookeeper as well the data center with more zookeeper nodes will win ) and that Namenode will become active.

Re: What should be the behaviour of HDFS HA in VPN environment

New Contributor

@Artem Ervits,@Benjamin Leonhardi Thanks for your suggestion I will keep in mind.

Re: What should be the behaviour of HDFS HA in VPN environment

New Contributor

@Artem Ervits, @Benjamin Leonhardi can you please tell me that,

How client will know which namenode is active ?

I mean I am sending request to hdfs service on 8020 port which initially uses some IP/DNS of that namenode machine.

Now if that namenode goes offline other one will active and its IP/DNS will be different, How would I know to request on that machine now onwards

Re: What should be the behaviour of HDFS HA in VPN environment

Which client. If you use webhdfs you need to check for a StandbyException in the call answer and switch to the other namenode yourself.

Other clients have access to the hdfs-site.xml and will do the switch automatically. You need to however specify the address using the HA name

hdfs://DEVHA

instead of

hdfs://devnn1:8020

or

hdfs://devnn2:8020

In the hdfs-site.xml you will find the mapping from the HA name you chose to the two namenodes and clients like the hadoop command will use that to find the active one.

Re: What should be the behaviour of HDFS HA in VPN environment

New Contributor

Hi @Benjamin Leonhardi, I don't understand exactly what you mean by other clients in your answer.

I am using small java application that upload/download document on the datanodes.So for java I guess WebHDFS is the client to make the requests to HDFS.How can I configure the automatic switch between two namenodes.It would be great if you help me out here with little details to your answer.

Don't have an account?
Coming from Hortonworks? Activate your account here