<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: distcp update difference between two snapshot syntax in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182035#M58627</link>
    <description>&lt;P&gt;Guys there is challenge I am facing .. when I am running the snapshotdiff from a remote cluster it is failing with snapshot not found error even though it is available .. do we have any solution for this .. we built a DR cluster and running distcp from DR to utilize the DR resources instead of overloading the PROD .. any solution how this can be achived..&lt;/P&gt;</description>
    <pubDate>Mon, 16 Jul 2018 18:33:43 GMT</pubDate>
    <dc:creator>Sreedhar_ch</dc:creator>
    <dc:date>2018-07-16T18:33:43Z</dc:date>
    <item>
      <title>distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182025#M58617</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Can anyone provide me syntax and sample example for checking the difference between two snapshot and move that difference data to target cluster using distcp?&lt;/P&gt;&lt;P&gt;AIM:&lt;/P&gt;&lt;P&gt;I have two clusters clusterA and ClusterB. I have recently built ClusterB and moving all the data from clusterA to clusterB. Before moving the data I have taken the snapshot on cluster A. During the interval of transferring the data, as the cluster A is still in active state the data got changed. Now I want to move only changed data from cluster A to cluster B. can someone provide me syntax with simple example like how can I get difference and move the changed data.&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 04:15:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182025#M58617</guid>
      <dc:creator>bandarusridhar1</dc:creator>
      <dc:date>2017-03-31T04:15:52Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182026#M58618</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5746/bandarusridhar1.html" nodeid="5746"&gt;@SBandaru&lt;/A&gt; -&lt;/P&gt;&lt;P&gt;Lets say s1 was the earlier snapshot. You will need to create the latest snapshot (say s2) on source cluster like&lt;/P&gt;&lt;PRE&gt;/usr/hdp/current/hadoop-hdfs-client/bin/hdfs dfs -createSnapshot /tmp/source s2&lt;/PRE&gt;&lt;P&gt;And then run distcp like below:&lt;/P&gt;&lt;PRE&gt;/usr/hdp/current/hadoop-client/bin/hadoop distcp -update -diff s1 s2  /tmp/source /tmp/target&lt;/PRE&gt;&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 04:33:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182026#M58618</guid>
      <dc:creator>namaheshwari</dc:creator>
      <dc:date>2017-03-31T04:33:02Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182027#M58619</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/102/nmaheshwari.html" nodeid="102"&gt;@Namit Maheshwari&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks for the quick response, I have tried the same way but I'm getting below error message. Any help is highly appreciated. &lt;/P&gt;&lt;PRE&gt;17/03/30 21:39:38 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getSnapshotDiffReport over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.SnapshotException): Cannot find the snapshot of directory /tmp/sbandaru with name sbandaru
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.getSnapshotByName(DirectorySnapshottableFeature.java:285)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiff(DirectorySnapshottableFeature.java:257)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReport(FSDirSnapshotOp.java:155)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReport(FSNamesystem.java:7674)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReport(NameNodeRpcServer.java:1792)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReport(ClientNamenodeProtocolServerSideTranslatorPB.java:1149)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2273)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2267)

17/03/30 21:57:46 WARN tools.DistCp: Failed to compute snapshot diff on hdfs://hadoop.hortonworks.com:8020/tmp/sbandaru
org.apache.hadoop.hdfs.protocol.SnapshotException: Cannot find the snapshot of directory /tmp/sbandaru with name sbandaru
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.getSnapshotByName(DirectorySnapshottableFeature.java:285)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.computeDiff(DirectorySnapshottableFeature.java:257)
        at org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:372)
        at org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReport(FSDirSnapshotOp.java:155)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReport(FSNamesystem.java:7674)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReport(NameNodeRpcServer.java:1792)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReport(ClientNamenodeProtocolServerSideTranslatorPB.java:1149)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)

&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Mar 2017 09:50:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182027#M58619</guid>
      <dc:creator>bandarusridhar1</dc:creator>
      <dc:date>2017-03-31T09:50:51Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182028#M58620</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5746/bandarusridhar1.html" nodeid="5746"&gt;@SBandaru&lt;/A&gt; - Its not able to find the snapshot of directory :&lt;/P&gt;&lt;PRE&gt;Cannot find the snapshot of directory /tmp/sbandaru with name sbandaru&lt;/PRE&gt;&lt;P&gt;Can you please ping how you created the snapshot, what was the location of the snapshot, and the command you issued for running distcp.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 10:01:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182028#M58620</guid>
      <dc:creator>namaheshwari</dc:creator>
      <dc:date>2017-03-31T10:01:45Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182029#M58621</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/102/nmaheshwari.html" nodeid="102"&gt;@Namit Maheshwari&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;Below is the requested information.&lt;/P&gt;&lt;PRE&gt;[sbandaru@hadoop ~]$ hdfs dfs -ls /user/sbandaru/.snapshot
Found 3 items
drwxr-x---   - sbandaru sbandaru          0 2017-03-30 11:38 /user/sbandaru/.snapshot/afterdistcp
drwxr-x---   - sbandaru sbandaru          0 2016-11-08 19:57 /user/sbandaru/.snapshot/sbandaru
drwxr-x---   - sbandaru sbandaru          0 2016-11-08 19:57 /user/sbandaru/.snapshot/sbandaru2
[sbandaru@hadoop ~]$


[sbandaru@hadoop ~]$ hadoop --loglevel DEBUG distcp -update -diff sbandaru afterdistcp /user/sbandaru hdfs://hadoop.hortonworks.com:8020/tmp/sbandaru

&lt;/PRE&gt;&lt;P&gt;I have created snapshot on /user/sbandaru directory then I'm trying to get difference of old and new snapsort and move that difference to a location /tmp/sbnadaru.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 10:52:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182029#M58621</guid>
      <dc:creator>bandarusridhar1</dc:creator>
      <dc:date>2017-03-31T10:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182030#M58622</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/102/nmaheshwari.html" nodeid="102"&gt;@Namit Maheshwari&lt;/A&gt; &lt;/P&gt;&lt;PRE&gt;"WARN tools.DistCp: Failed to compute snapshot diff on hdfs://hadoop.hortonworks.com:8020/tmp/sbandaru"&lt;/PRE&gt;&lt;P&gt;Above one is part of the error message which is my target location, why it's trying to find the snapshot in target location ?&lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 21:52:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182030#M58622</guid>
      <dc:creator>bandarusridhar1</dc:creator>
      <dc:date>2017-03-31T21:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182031#M58623</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5746/bandarusridhar1.html" nodeid="5746"&gt;@SBandaru&lt;/A&gt; - Below is an excellent article on HCC explaining distcp with Snapshots:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/articles/71775/managing-hadoop-dr-with-distcp-and-snapshots.html" target="_blank"&gt;https://community.hortonworks.com/articles/71775/managing-hadoop-dr-with-distcp-and-snapshots.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;From the article:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Source must support 'snapshots'&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfsadmin -allowSnapshot &amp;lt;path&amp;gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Target is "read-only"&lt;/LI&gt;&lt;LI&gt;Target, after initial baseline 'distcp' sync needs to support snapshots.&lt;/LI&gt;&lt;/UL&gt;&lt;H2&gt;Process&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;Identify the source and target 'parent' directory&lt;UL&gt;&lt;LI&gt;Do not initially create the destination directory, allow the first distcp to do that. For example: If I want to sync source `/data/a` with `/data/a_target`, do *NOT* pre-create the 'a_target' directory.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Allow snapshots on the source directory&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfsadmin -allowSnapshot /data/a&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Create a Snapshot of /data/a&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfs -createSnapshot /data/a s1&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Distcp the baseline copy (from the atomic snapshot). Note: /data/a_target does NOT exists prior to the following command.&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hadoop distcp /data/a/.snapshot/s1 /data/a_target&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Allow snapshots on the newly create target directory&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfsadmin -allowSnapshot /data/a_target&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;At this point /data/a_target should be considered "read-only". Do NOT make any changes to the content here.&lt;/LI&gt;&lt;LI&gt;Create a matching snapshot in /data/a_target that matches the name of the snapshot used to build the baseline&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfs -createSnapshot /data/a_target s1&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Add some content to the source directory /data/a. Make changes, add, deletes, etc. that need to be replicated to /data/a_target.&lt;/LI&gt;&lt;LI&gt;Take a new snapshot of /data/a&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfs -createSnapshot /data/a s2&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Just for fun, check on whats changed between the two snapshots&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs snapshotDiff /data/a s1 s2&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;Ok, now let's migrate the changes to /data/a_target&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hadoop distcp -diff s1 s2 -update /data/a /data/a_target&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;When that's completed, finish the cycle by creating a matching snapshot on /data/a_target&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;hdfs dfs -createSnapshot /data/a_target s2&lt;/PRE&gt;&lt;P&gt;That's it. You've completed the cycle. Rinse and repeat.&lt;/P&gt;</description>
      <pubDate>Sat, 01 Apr 2017 06:47:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182031#M58623</guid>
      <dc:creator>namaheshwari</dc:creator>
      <dc:date>2017-04-01T06:47:08Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182032#M58624</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5746/bandarusridhar1.html" nodeid="5746"&gt;@SBandaru&lt;/A&gt; - Is your issue resolved. Or you need any further help here.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 04:02:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182032#M58624</guid>
      <dc:creator>namaheshwari</dc:creator>
      <dc:date>2017-04-06T04:02:33Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182033#M58625</link>
      <description>&lt;P&gt;I have the same issue when trying to compute the diff.&lt;/P&gt;&lt;PRE&gt;hadoop distcp -diff s1 s2 -update /data/a /data/a_target&lt;/PRE&gt;&lt;P&gt;/data/a_target is on another cluster. s1 (yesterdays snap) and s2 (todays snap) on the first cluster location are side by side of course. I wonder if the diff needs to the snapshot filename only, and not the absolute path.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Sep 2017 23:56:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182033#M58625</guid>
      <dc:creator>mtdeguzis</dc:creator>
      <dc:date>2017-09-11T23:56:13Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182034#M58626</link>
      <description>&lt;P&gt;Hmm... so it does* appear you need to provide just* the filename for S1 and S2. interesting&lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2017 00:08:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182034#M58626</guid>
      <dc:creator>mtdeguzis</dc:creator>
      <dc:date>2017-09-12T00:08:50Z</dc:date>
    </item>
    <item>
      <title>Re: distcp update difference between two snapshot syntax</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182035#M58627</link>
      <description>&lt;P&gt;Guys there is challenge I am facing .. when I am running the snapshotdiff from a remote cluster it is failing with snapshot not found error even though it is available .. do we have any solution for this .. we built a DR cluster and running distcp from DR to utilize the DR resources instead of overloading the PROD .. any solution how this can be achived..&lt;/P&gt;</description>
      <pubDate>Mon, 16 Jul 2018 18:33:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/distcp-update-difference-between-two-snapshot-syntax/m-p/182035#M58627</guid>
      <dc:creator>Sreedhar_ch</dc:creator>
      <dc:date>2018-07-16T18:33:43Z</dc:date>
    </item>
  </channel>
</rss>

