Created on 06-17-2015 12:09 PM - edited 09-16-2022 02:31 AM
We are currently trying to set up HBase replication from a cluster on our own hardware, to a new one on our VPC on Amazon AWS
(for disaster recovery).
The Amazon nodes have Private DNS like: ip-10-3-1-61.us-west-2.compute.internal and Private IPs like 10.3.1.61
All nodes on our hardware can ping, ssh, telnet etc to the Amazon nodes (using the IP or FQDN), and vice versa. There are entries in the
/etc/hosts files on all servers on both sides to ensure resolution. Our servers are running Ubuntu 12.04, the AWS ones are Red Hat 6.
Both clusters are running HBase 0.98.6, both have replication set to true. The tables we want replicated have replication_scope set to 1. I've
added the peer entry in hbase shell.
So everything seems good. However, when I start replication I see this in the Hbase logs on the master side:
2015-06-16 12:37:15,643 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error:
java.net.UnknownHostException: unknown host: ip-10-3-1-62.us-west-2.compute.internal
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.<init>(RpcClient.java:385)
at org.apache.hadoop.hbase.ipc.RpcClient.createConnection(RpcClient.java:351)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1530)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:21036)
at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:65)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:730)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:388)
I understand the error, I just don't understand how it's getting it. What process is not able to resolve the hostname?
Any ideas?
Thanks in advance,
Paul
Created 06-17-2015 02:00 PM
I realised the problem. My /etc/hosts file on the AWS cluster nodes needed to contain references to each other.
I had added references to the source cluster nodes, but the AWS nodes needed to be in there too.
Created 06-17-2015 12:12 PM
Created 06-17-2015 12:18 PM
Thanks Gautam,
Yes both 'DNS Resolution' and 'DNS Hostnames' are set to YES for the VPC
Paul
Created 06-17-2015 02:00 PM
I realised the problem. My /etc/hosts file on the AWS cluster nodes needed to contain references to each other.
I had added references to the source cluster nodes, but the AWS nodes needed to be in there too.
Created on 08-11-2018 04:37 AM - edited 08-11-2018 04:37 AM
please, could you put, your hosts file example?
what did you put there?
Thanks a lot