Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do HBase and HDFS need to be co-located on the same machines? If so, how much?

avatar
Expert Contributor

Customer has "Cluster A" (20 node standard Hadoop cluster: HDFS, YARN, Hive, etc. but no HBase). Customer is adding "Cluster B" (6 nodes dedicated for HBase use). Cluster A and Cluster B are on neighoring racks in the same datacenter, same VLAN, etc.

Is it technically safe/possible to install the RegionServers in "Cluster B", but point them to the HDFS instance in "Cluster A"?

If this is possible, what compromises would we make in terms of HBase performance? Certain SCANs would be more slow as the RegionServers loaded remote HFiles into memory? Writes would be more slow due to no DataNode service running in Cluster B with HBase servers?

Thanks!

1 ACCEPTED SOLUTION

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Master Mentor

@Wes Floyd Great question! @Enis @Josh Elser

avatar
Super Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Apache JIRA HDFS-347 contains some benchmarks related to HDFS short-circuit read. There is a lot of commentary on that issue, so it would take some effort to scan through and find the relevant comments about the benchmarks.

avatar
Expert Contributor

Very helpful guys. Appreciated!