Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

distcp update difference between two snapshot syntax

avatar

Hi,

Can anyone provide me syntax and sample example for checking the difference between two snapshot and move that difference data to target cluster using distcp?

AIM:

I have two clusters clusterA and ClusterB. I have recently built ClusterB and moving all the data from clusterA to clusterB. Before moving the data I have taken the snapshot on cluster A. During the interval of transferring the data, as the cluster A is still in active state the data got changed. Now I want to move only changed data from cluster A to cluster B. can someone provide me syntax with simple example like how can I get difference and move the changed data.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
10 REPLIES 10

avatar

@SBandaru -

Lets say s1 was the earlier snapshot. You will need to create the latest snapshot (say s2) on source cluster like

/usr/hdp/current/hadoop-hdfs-client/bin/hdfs dfs -createSnapshot /tmp/source s2

And then run distcp like below:

/usr/hdp/current/hadoop-client/bin/hadoop distcp -update -diff s1 s2  /tmp/source /tmp/target

Hope this helps

avatar

@Namit Maheshwari

Thanks for the quick response, I have tried the same way but I'm getting below error message. Any help is highly appreciated.

17/03/30 21:39:38 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getSnapshotDiffReport over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.SnapshotException): Cannot find the snapshot of directory /tmp/sbandaru with name sbandaru
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.getSnapshotByName(DirectorySnapshottableFeature.java:285)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiff(DirectorySnapshottableFeature.java:257)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReport(FSDirSnapshotOp.java:155)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReport(FSNamesystem.java:7674)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReport(NameNodeRpcServer.java:1792)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReport(ClientNamenodeProtocolServerSideTranslatorPB.java:1149)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2273)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2267)

17/03/30 21:57:46 WARN tools.DistCp: Failed to compute snapshot diff on hdfs://hadoop.hortonworks.com:8020/tmp/sbandaru
org.apache.hadoop.hdfs.protocol.SnapshotException: Cannot find the snapshot of directory /tmp/sbandaru with name sbandaru
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.getSnapshotByName(DirectorySnapshottableFeature.java:285)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiff(DirectorySnapshottableFeature.java:257)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReport(FSDirSnapshotOp.java:155)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReport(FSNamesystem.java:7674)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReport(NameNodeRpcServer.java:1792)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReport(ClientNamenodeProtocolServerSideTranslatorPB.java:1149)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)

avatar

@SBandaru - Its not able to find the snapshot of directory :

Cannot find the snapshot of directory /tmp/sbandaru with name sbandaru

Can you please ping how you created the snapshot, what was the location of the snapshot, and the command you issued for running distcp.

avatar

@Namit Maheshwari

Below is the requested information.

[sbandaru@hadoop ~]$ hdfs dfs -ls /user/sbandaru/.snapshot
Found 3 items
drwxr-x---   - sbandaru sbandaru          0 2017-03-30 11:38 /user/sbandaru/.snapshot/afterdistcp
drwxr-x---   - sbandaru sbandaru          0 2016-11-08 19:57 /user/sbandaru/.snapshot/sbandaru
drwxr-x---   - sbandaru sbandaru          0 2016-11-08 19:57 /user/sbandaru/.snapshot/sbandaru2
[sbandaru@hadoop ~]$


[sbandaru@hadoop ~]$ hadoop --loglevel DEBUG distcp -update -diff sbandaru afterdistcp /user/sbandaru hdfs://hadoop.hortonworks.com:8020/tmp/sbandaru

I have created snapshot on /user/sbandaru directory then I'm trying to get difference of old and new snapsort and move that difference to a location /tmp/sbnadaru.

avatar

@Namit Maheshwari

"WARN tools.DistCp: Failed to compute snapshot diff on hdfs://hadoop.hortonworks.com:8020/tmp/sbandaru"

Above one is part of the error message which is my target location, why it's trying to find the snapshot in target location ?

avatar
Contributor

I have the same issue when trying to compute the diff.

hadoop distcp -diff s1 s2 -update /data/a /data/a_target

/data/a_target is on another cluster. s1 (yesterdays snap) and s2 (todays snap) on the first cluster location are side by side of course. I wonder if the diff needs to the snapshot filename only, and not the absolute path.

avatar
Contributor

Hmm... so it does* appear you need to provide just* the filename for S1 and S2. interesting

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

@SBandaru - Is your issue resolved. Or you need any further help here.