Reply
Cloudera Employee
Posts: 14
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi Fawze, I'm not aware of any doc for that.
Expert Contributor
Posts: 230
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Thanks,

Do you know if i should manage the create/delete/ and update snapshots
during the distcp or it will be as a part of the restore process?

Don't feeling confident to use it without documentation of what and how.
Cloudera Employee
Posts: 14
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Sorry I don't know too many details. But you can always ping Yongjun. He is
the owner of YARN-9820. And you can find his email address in upstream
JIRA.
Expert Contributor
Posts: 230
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

@Yufei Gu 

 

At CDH5.10.0 if the distcp running with snapshot diff and it failed for any reason, it not running anymore even a first run like. Tried to reach any documntation on the enhancement done for distcp and how to use it at cdh5.10.0 with no success

Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi Fawze,

 

If using "distcp -diff" fails for some reason, we have two options:

 

1. using HDFS-9820 feature to rewind the target cluster to the state before the failed distcp, then run "distcp -diff" again.

2. run distcp without -diff.

 

Hopefully option 1 is still faster since we are copying delta over to the target. 

 

Thanks.

 

--Yongjun

 

Expert Contributor
Posts: 230
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

[ Edited ]

And this is my question, how to use it, which commands to be used to
restore the snapshot at the DR.

Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

HI Fawze,

Please refer to "-rdiff" section in

https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html

for usage. If you have further questions, please feel free to ask.

For your reference,

https://issues.apache.org/jira/browse/HDFS-9820?focusedCommentId=15543394&page=com.atlassian.jira.pl...

Thanks.

--Yongjun
Expert Contributor
Posts: 230
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

[ Edited ]

@Yongjun Zhang Thanks for response and readiness to help.

 

After the first run, i created s0 at both source and destination then i created a distcp script as the following: 

1- creating S1 at source

hdfs dfs -createSnapshot hdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr/output s1

2- issue distcp diff between s0 s1.
hadoop distcp -Dmapreduce.job.name=Reporting -Dmapred.job.queue.name=distcp.reportingcopy -update -strategy dynamic -p -m 50 -diff s0 s1 hdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr/output hdfs://${DEST_SITE}/liveperson/data/remote/DC=VA/server_live-engage-mr/output

 

3- In case the distcp success

if [ $? -eq 0 ]
then

 

4- create s1 at destination:
hdfs dfs -createSnapshot hdfs://${DEST_SITE}/liveperson/data/remote/DC=VA/server_live-engage-mr/output s1

5- delete 0 at source and destination and rename s1 to s0 so both farms have s0 in that case. 

hdfs dfs -deleteSnapshot hdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr/output s0
hdfs dfs -renameSnapshot hdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr/output s1 s0
hdfs dfs -deleteSnapshot hdfs://${DEST_SITE}/liveperson/data/remote/DC=VA/server_live-engage-mr/output s0
hdfs dfs -renameSnapshot hdfs://${DEST_SITE}/liveperson/data/remote/DC=VA/server_live-engage-mr/output s1 s0

fi

 

My question:

 

Should i use rdiff only if distcp fail or just replaced the current diff with it, if it only when distco failed so my script will have

 

hadoop distcp -Dmapreduce.job.name=Reporting -Dmapred.job.queue.name=distcp.reportingcopy -update -strategy dynamic -p -m 50 -rdiff s0 s1 hdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr/output hdfs://${DEST_SITE}/liveperson/data/remote/DC=VA/server_live-engage-mr/output

 

and this section should be done when i get a message that the destination has been changed since snapshot S0.

 

But i have problem that at destination i have only one snapshot at any time and according to the documntation thatr rdiff working only if there are 2 snapshots at the destination.

 

My second question, why when i upgraded the cluster the distcp diff stopped working on a cases where the distcp failed, i was expected the current used distcp with snapshot to be forward compatible with the higher versions, in 5.10.0 the distcp diff failed if the destination was changed since snapshot S0 which is a typical case for distcp run after any network issue or environemnt issue that caused the distcp tp failed, where in 5.9.0 if the target changed since snapshot S0 it's listing all the source files and running reqular distcp.

 

Now i need to do alot of work to get the rdiff working as expected and cover all the edge cases.

 

Any help is so much appreciated.

 

 



Expert Contributor
Posts: 230
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

@Yongjun Zhang sorry missed you comment that i should use rdiff and then run rqualt distcp diff, will happy if can give me any comment in the way i used it as i mentioned in my last post on this issue.

 

Also this is the error i get when i run it on 5.10.0 with debug level

 

====================

 

With Debug level:

 

17/03/30 08:30:19 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
17/03/30 08:30:19 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
17/03/30 08:30:19 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
17/03/30 08:30:19 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
17/03/30 08:30:19 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
17/03/30 08:30:19 DEBUG security.Groups: Creating new Groups object
17/03/30 08:30:19 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000; warningDeltaMs=5000
17/03/30 08:30:19 DEBUG security.UserGroupInformation: hadoop login
17/03/30 08:30:19 DEBUG security.UserGroupInformation: hadoop login commit
17/03/30 08:30:19 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: cloudera-scm
17/03/30 08:30:19 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: cloudera-scm" with name cloudera-scm
17/03/30 08:30:19 DEBUG security.UserGroupInformation: User entry: "cloudera-scm"
17/03/30 08:30:19 DEBUG security.UserGroupInformation: Assuming keytab is managed externally since logged in from subject.
17/03/30 08:30:19 DEBUG security.UserGroupInformation: UGI loginUser:cloudera-scm (auth:SIMPLE)
17/03/30 08:30:19 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
17/03/30 08:30:19 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
17/03/30 08:30:19 DEBUG azure.NativeAzureFileSystem: finalize() called.
17/03/30 08:30:19 DEBUG azure.NativeAzureFileSystem: finalize() called.
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:20 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:20 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
17/03/30 08:30:20 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@550049b6
17/03/30 08:30:20 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:20 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/03/30 08:30:20 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
17/03/30 08:30:20 DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@258cde9c: starting with interruptCheckPeriodMs = 60000
17/03/30 08:30:20 DEBUG util.PerformanceAdvisory: Both short-circuit local reads and UNIX domain socket are disabled.
17/03/30 08:30:20 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
17/03/30 08:30:20 DEBUG ipc.Client: The ping interval is 60000 ms.
17/03/30 08:30:20 DEBUG ipc.Client: Connecting to aoor-mhc102.lpdomain.com/10.26.180.77:8020
17/03/30 08:30:20 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm: starting, having connections 1
17/03/30 08:30:20 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #0
17/03/30 08:30:20 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #0
17/03/30 08:30:20 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 55ms
17/03/30 08:30:20 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output], targetPath=hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output, targetPathExists=true, filtersFile='null'}
17/03/30 08:30:20 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
17/03/30 08:30:20 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
17/03/30 08:30:20 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
17/03/30 08:30:20 DEBUG security.UserGroupInformation: PrivilegedAction as:cloudera-scm (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136)
17/03/30 08:30:20 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
17/03/30 08:30:20 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
17/03/30 08:30:20 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:20 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
17/03/30 08:30:20 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
17/03/30 08:30:21 DEBUG security.UserGroupInformation: PrivilegedAction as:cloudera-scm (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:335)
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://AlphaDR
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
17/03/30 08:30:21 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:21 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
17/03/30 08:30:21 DEBUG crypto.OpensslAesCtrCryptoCodec: Using org.apache.hadoop.crypto.random.OsSecureRandom as random number generator.
17/03/30 08:30:21 DEBUG util.PerformanceAdvisory: Using crypto codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.
17/03/30 08:30:21 DEBUG mapreduce.Cluster: Picked org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider
17/03/30 08:30:21 DEBUG mapred.ResourceMgrDelegate: getStagingAreaDir: dir=/user/cloudera-scm/.staging
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #1
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #1
17/03/30 08:30:21 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 2ms
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #2
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #2
17/03/30 08:30:21 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
17/03/30 08:30:21 DEBUG tools.DistCp: Meta folder location: /user/cloudera-scm/.staging/_distcp425151441
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
17/03/30 08:30:21 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:21 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #3
17/03/30 08:30:22 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #3
17/03/30 08:30:22 DEBUG ipc.ProtobufRpcEngine: Call: getSnapshotDiffReport took 1222ms
17/03/30 08:30:22 WARN tools.DistCp: The target has been modified since snapshot s0
17/03/30 08:30:22 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #4
17/03/30 08:30:22 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #4
17/03/30 08:30:22 DEBUG ipc.ProtobufRpcEngine: Call: delete took 1ms
17/03/30 08:30:22 ERROR tools.DistCp: Exception encountered
java.lang.Exception: DistCp sync failed, input options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output/.snapshot/s1], targetPath=hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output, targetPathExists=true, filtersFile='null'}
at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:84)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:141)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
17/03/30 08:30:22 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:22 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@44ff60de
+ '[' 25 -eq 0 ']'

Expert Contributor
Posts: 230
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

[ Edited ]

This is the parameters that distcp run at 5.10.0 next of a failed distcp
java.lang.Exception: DistCp sync failed,
input options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false,
overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false,
blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic',
preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null,
logPath=null, sourceFileListing=null, sourcePaths=[/fawzesource/event_type=EnterSiteEvent/year=2016/.snapshot/s1],
targetPath=/fawzedestination/event_type=EnterSiteEvent/year=2016, targetPathExists=true, filtersFile='null'}

 

============================================

and this what distcp run at 5.9.0 next of a failed distcp
17/04/03 08:54:46 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false,
ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100,
sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES],
preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/fawzesource/event_type=EnterSiteEvent/year=2016],
targetPath=/fawzedestination/event_type=EnterSiteEvent/year=2016, targetPathExists=true, filtersFile='null'}

================================


I see alot of difference and don't know the imapct of the differences.


Also regarding the prerequsite i have few questions:

Use snapshot diff report between given two snapshots (Q: should these 2 snapshots in the source or the destination , what i mean -rdiff <newsnapshot> <oldsnapshot>
where should be the newsnapshot and the oldsnapshot that should be used in the -rdiff) to identify what has been changed on the target since the snapshot <oldSnapshot> was created on the target,
and apply the diff reversely to the target, and copy modified files from the source’s <oldSnapshot>, to make the target the same as <oldSnapshot>.

Q: If both new and old snapshots should be in the target. why in the distcp -rdiff i should pass the target path and destination path?
isn't it should only the destination path, why i need the source path?


This option is valid only with -update option and the following conditions should be satisfied.
Both the source and the target FileSystem must be DistributedFileSystem. The source and the target can be two different clusters/paths,
or they can be exactly the same cluster/path. In the latter case, modified files are copied from target’s <oldSnapshot> to target’s current state).


Two snapshots <newSnapshot> and <oldSnapshot> have been created on the target FS, and <oldSnapshot> is older than <newSnapshot>. No change has been made on target since <newSnapshot> was created on the target.
( Q: Is that mean that i should create S1 at target before running the -rdiff and then run -rdiff s1 s0)?

 

Announcements