Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Configure multihomed for HDFS to use high speed network for data

Configure multihomed for HDFS to use high speed network for data

New Contributor

Hi all, I have a test ambari cluster. Each node has two interface 1GbE for management and 25GbE high speed for data.

Both interfaces have had DNS/rDNS configured on a central DNS server.

I have gone through this guide http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html

and enabled those options through ambari.

When adding a new dataNode, I specify its hostname of the management interface. I did few benchmark, and found that the HDFS does not use the 25GbE network at all.

I am wondering if I am still missing something to fully enable multihoming for HDFS?

Cheers,

Derrick

3 REPLIES 3
Highlighted

Re: Configure multihomed for HDFS to use high speed network for data

Guru

When you try to connect to one datanode from another, the connection has to be going through 25GbE network. This is most likely a DNS setup issue. Try with ping and nc to see which network is used when connecting between 2 data nodes.

Re: Configure multihomed for HDFS to use high speed network for data

@Derrick Lin Please take a look at article https://community.hortonworks.com/content/kbentry/24277/parameters-for-multi-homing.html

There is more to multi-home configuration than is in the HDFS document, and the more complete discussion may help you resolve your problem, especially wrt DNS and naming. Biggest question being, do the cluster hosts have the same name on all networks? Hope this helps.

Re: Configure multihomed for HDFS to use high speed network for data

New Contributor

Thanks everyone, I read:

  1. Each server should have one consistent hostname on all interfaces. Theoretically, DNS allows a server to have a different hostname per network interface. But it is required[2] for multi-homed Hadoop clusters that each server have only one hostname, consistent among all network interfaces used by Hadoop. (Network interfaces excluded from use by Hadoop may allow other hostnames.)

This is not the case on my environment, but that's OK. We will just register all nodes via 25GbE high speed network for now then.

Thanks