06-17-2015 12:09 PM - edited 06-17-2015 12:48 PM
We are currently trying to set up HBase replication from a cluster on our own hardware, to a new one on our VPC on Amazon AWS
(for disaster recovery).
The Amazon nodes have Private DNS like: ip-10-3-1-61.us-west-2.compute.internal and Private IPs like 10.3.1.61
All nodes on our hardware can ping, ssh, telnet etc to the Amazon nodes (using the IP or FQDN), and vice versa. There are entries in the
/etc/hosts files on all servers on both sides to ensure resolution. Our servers are running Ubuntu 12.04, the AWS ones are Red Hat 6.
Both clusters are running HBase 0.98.6, both have replication set to true. The tables we want replicated have replication_scope set to 1. I've
added the peer entry in hbase shell.
So everything seems good. However, when I start replication I see this in the Hbase logs on the master side:
2015-06-16 12:37:15,643 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error:
java.net.UnknownHostException: unknown host: ip-10-3-1-62.us-west-2.compute.internal
I understand the error, I just don't understand how it's getting it. What process is not able to resolve the hostname?
Thanks in advance,
06-17-2015 02:00 PM
I realised the problem. My /etc/hosts file on the AWS cluster nodes needed to contain references to each other.
I had added references to the source cluster nodes, but the AWS nodes needed to be in there too.