07-26-2017 11:25 AM
This is related to the post - distcp post
We are trying to export a snapshot from one cluster to another cluster using the below command
hbase org.apache.hadoop.hbase.snapshot.ExcportSnapshot -snapshot mysnapshot -copy-from hdfs://namenode1:8020/hbase -copy-to hdfs://namenode2:8020/hbase
We are running hbase 1.0.0 cdh5.5.1+274-1.cdh5.5.1.p0.15.e17. Ports 8020 and 50010 are open between the 2 clusters, i.e., I can telnet to these ports. When the command runs an empty file /hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo is created on namenode2. The error received is:
INFO [main] snapshot.ExportSnapshot: Copy Snapshot Manifest WARN [Thread-6] hdfs.DFSClient: DataStreamer Exception java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) ..... ..... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java.668) Exception in thread "main" org.apache.hadoop.hbase.snapshot.ExportSnapshotExcpetion: Failed to copy the snapshot directory: from=hdfs://namenode1:8020/hbase/.hbase-snapshot/contentSnapshot to=hdfs://namenode2:8020/hbase/.hbase-snapshot/.tmp/contentSnapshot at org.apache.hadoop.hbase.snapshot.ExportSnapshot.run(ExportSnapshot.java:932) ..... and more and more ...
So, anyway I am trying to debug this and I cannot find the exportSnapshot source code for the specific version of the software we are running. Some connection between the 2 servers is happening because the empty file is created. It seems to fail when copying the data.manifest (maybe?).
I guess one question is can the source code be located for 1.0 5.5.1 ?
07-28-2017 08:02 AM
This problem was finally resolved.
For anyone else having similar, what appears to be quirky, problems with the exportSnapshot here is how we resolved it. FYI - finding the source code of the version of exportSnapshot we were running helped to pinpoint exactly where the error was occurring and what had been executed and successful to that point. I also ran it different ways from each cluster and found an unusual alias being used when trying the hftp://server:50070 port.
So the bottom line was each cluster - all namenodes and datanodes had to resolve (have added to /etc/hosts) EVERY alias being used including all internal ip'd aliases to the external ips whether you thought it was explicity being used by hadoop or hbase somewhere or not.
Thanks to all. Better error messages and/or the ability to debug the code would have been helpful.
02-06-2019 01:47 AM
Thanks for sharing the steps to resolve the issue. Yes, indeed every NN/DN in each cluster should have access to other cluster's node and vice-versa since the ExportSnapshot is more of the HDFS distcp operation where the majority of the work involves copying the HFiles (associated with the snapshot) in a distributed fashion from the source to target (similar to distcp).
It would be helpful if you could share the complete stack trace of the exception which would also help to understand the flow during the failure.
Again thanks for taking the time to post the solution.