Created 03-26-2017 08:06 AM
Hi,
I had a distcp that was working between active farm and DR, lastly i upgraded the DR farm to 5.10.0 while the active farm still with 5.5.4, when i'm running:
hadoop distcp -Dmapreduce.job.name=reporting -update -p -m 80 -strategy dynamic -diff s0 s1 webhdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr /liveperson/data/remote/DC=${DC}/server_live-engage-mr/
i got the error:
17/03/26 11:03:31 ERROR tools.DistCp: Exception encountered
java.lang.Exception: DistCp sync failed, input options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=80, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/.snapshot/s1], targetPath=/liveperson/data/remote/DC=Alpha/server_live-engage-mr, targetPathExists=true, filtersFile='null'}
at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:84)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:141)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
i changed the hdfs to webhdfs as now the 2 cluster with different versions:
hadoop distcp -Dmapreduce.job.name=reporting -update -p -m 80 -strategy dynamic -diff s0 s1 webhdfs://${SRC_SITE}/liveperson/data/server_live-engage-mr /liveperson/data/remote/DC=${DC}/server_live-engage-mr/
and i'm getting the following error:
17/03/26 11:05:42 INFO client.RMProxy: Connecting to ResourceManager at aoor-mhc101.lpdomain.com/10.26.180.76:8032
17/03/26 11:05:42 ERROR tools.DistCp: Exception encountered
java.lang.IllegalArgumentException: The FileSystems needs to be DistributedFileSystem for using snapshot-diff-based distcp
at org.apache.hadoop.tools.DistCpSync.preSyncCheck(DistCpSync.java:98)
at org.apache.hadoop.tools.DistCpSync.sync(DistCpSync.java:147)
at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:81)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:141)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
Created 03-29-2017 09:08 AM
Now i'm getting different error:
17/03/29 11:40:08 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output], targetPath=hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output, targetPathExists=true, filtersFile='null'}
17/03/29 11:40:09 WARN tools.DistCp: The target has been modified since snapshot s0
17/03/29 11:40:09 ERROR tools.DistCp: Exception encountered
java.lang.Exception: DistCp sync failed, input options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output/.snapshot/s1], targetPath=hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output, targetPathExists=true, filtersFile='null'}
at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:84)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:141)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
Created 03-29-2017 12:06 PM
rolling back the CDH version is solving the issue, so i'm so confused as i'm also unable to get any documntation of the last improvments in the distcp at CDH5.10
Created 03-30-2017 05:32 AM
With Debug level:
17/03/30 08:30:19 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
17/03/30 08:30:19 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
17/03/30 08:30:19 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
17/03/30 08:30:19 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
17/03/30 08:30:19 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
17/03/30 08:30:19 DEBUG security.Groups: Creating new Groups object
17/03/30 08:30:19 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000; warningDeltaMs=5000
17/03/30 08:30:19 DEBUG security.UserGroupInformation: hadoop login
17/03/30 08:30:19 DEBUG security.UserGroupInformation: hadoop login commit
17/03/30 08:30:19 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: cloudera-scm
17/03/30 08:30:19 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: cloudera-scm" with name cloudera-scm
17/03/30 08:30:19 DEBUG security.UserGroupInformation: User entry: "cloudera-scm"
17/03/30 08:30:19 DEBUG security.UserGroupInformation: Assuming keytab is managed externally since logged in from subject.
17/03/30 08:30:19 DEBUG security.UserGroupInformation: UGI loginUser:cloudera-scm (auth:SIMPLE)
17/03/30 08:30:19 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
17/03/30 08:30:19 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
17/03/30 08:30:19 DEBUG azure.NativeAzureFileSystem: finalize() called.
17/03/30 08:30:19 DEBUG azure.NativeAzureFileSystem: finalize() called.
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:19 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:20 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:20 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:20 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
17/03/30 08:30:20 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@550049b6
17/03/30 08:30:20 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:20 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
17/03/30 08:30:20 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
17/03/30 08:30:20 DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@258cde9c: starting with interruptCheckPeriodMs = 60000
17/03/30 08:30:20 DEBUG util.PerformanceAdvisory: Both short-circuit local reads and UNIX domain socket are disabled.
17/03/30 08:30:20 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
17/03/30 08:30:20 DEBUG ipc.Client: The ping interval is 60000 ms.
17/03/30 08:30:20 DEBUG ipc.Client: Connecting to aoor-mhc102.lpdomain.com/10.26.180.77:8020
17/03/30 08:30:20 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm: starting, having connections 1
17/03/30 08:30:20 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #0
17/03/30 08:30:20 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #0
17/03/30 08:30:20 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 55ms
17/03/30 08:30:20 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output], targetPath=hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output, targetPathExists=true, filtersFile='null'}
17/03/30 08:30:20 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
17/03/30 08:30:20 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
17/03/30 08:30:20 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
17/03/30 08:30:20 DEBUG security.UserGroupInformation: PrivilegedAction as:cloudera-scm (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136)
17/03/30 08:30:20 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
17/03/30 08:30:20 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
17/03/30 08:30:20 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:20 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
17/03/30 08:30:20 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
17/03/30 08:30:21 DEBUG security.UserGroupInformation: PrivilegedAction as:cloudera-scm (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:335)
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://AlphaDR
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
17/03/30 08:30:21 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:21 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
17/03/30 08:30:21 DEBUG crypto.OpensslAesCtrCryptoCodec: Using org.apache.hadoop.crypto.random.OsSecureRandom as random number generator.
17/03/30 08:30:21 DEBUG util.PerformanceAdvisory: Using crypto codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.
17/03/30 08:30:21 DEBUG mapreduce.Cluster: Picked org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider
17/03/30 08:30:21 DEBUG mapred.ResourceMgrDelegate: getStagingAreaDir: dir=/user/cloudera-scm/.staging
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #1
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #1
17/03/30 08:30:21 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 2ms
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #2
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #2
17/03/30 08:30:21 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
17/03/30 08:30:21 DEBUG tools.DistCp: Meta folder location: /user/cloudera-scm/.staging/_distcp425151441
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
17/03/30 08:30:21 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /hadoop/sockets/hdfs-sockets/dn
17/03/30 08:30:21 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
17/03/30 08:30:21 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:21 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
17/03/30 08:30:21 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #3
17/03/30 08:30:22 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #3
17/03/30 08:30:22 DEBUG ipc.ProtobufRpcEngine: Call: getSnapshotDiffReport took 1222ms
17/03/30 08:30:22 WARN tools.DistCp: The target has been modified since snapshot s0
17/03/30 08:30:22 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm sending #4
17/03/30 08:30:22 DEBUG ipc.Client: IPC Client (1983511229) connection to aoor-mhc102.lpdomain.com/10.26.180.77:8020 from cloudera-scm got value #4
17/03/30 08:30:22 DEBUG ipc.ProtobufRpcEngine: Call: delete took 1ms
17/03/30 08:30:22 ERROR tools.DistCp: Exception encountered
java.lang.Exception: DistCp sync failed, input options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=s0, toSnapshot=s1, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=100, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='dynamic', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://AlphaProd/liveperson/data/server_live-engage-mr/output/.snapshot/s1], targetPath=hdfs://AlphaDR/liveperson/data/remote/DC=Alpha/server_live-engage-mr/output, targetPathExists=true, filtersFile='null'}
at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:84)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:141)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
17/03/30 08:30:22 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@44ff60de
17/03/30 08:30:22 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@44ff60de
+ '[' 25 -eq 0 ']'
Created 03-30-2017 09:47 AM
Pleeeeeeeeeeeeeeeease help.
Created 03-30-2017 04:13 PM
Created on 03-30-2017 05:15 PM - edited 03-30-2017 06:07 PM
Hi Harsh,
But as you can see my last comments, i'm just usin hdfs:// and still get the same error and when i rollback the cdh to 5.9.0 it back to work.
The error occur when The target has been modified since snapshot s0, which means once the ditscp snapshot run failed one time, it stop working.
Can you also help me to get a documntation of how to use the snapshot restore as my main uprade reason is to use this utility in my distcp.
Created 03-31-2017 11:07 AM
@Harsh J hope you can help me with this
Created 04-10-2017 05:03 AM
@Harsh J just need a small help with the command as it not mentioned in the documntation of in the help in the command line.
If i want to backup the source_folder in the active farm to the disaster recovery farm under destination_folder and i want to run -rdiff at the destination_folder.
the source_file has snapshot s0 and s1 and destination has snapshot s0,which already modified as the partial completed distcp process that failed.
So the current state, the destination_folder has some file not in destinatin snapshot s0 and i want to revert it to s0, so i created s1 at destination, how the revert command should looks like:
hadoop distcp s1 s0 destination_folder source_file or hadoop distcp s1 s0 source_file destination_folder?
I assume that s1 s0 are related to the snapshots of the destination folder, is it?
Created 04-17-2017 12:30 AM
I got it,
The right order is hadoop distcp -rdiff s1 s0 source destination
Created 04-17-2017 12:44 AM
So just to make things more clear and useful for who is going to use this feature or using the distcp diff in his current CDH version.
1- If you are using the snapshot diff in your current version (prior to CDH5.10 or CDH5.9.1), the distcp was able to overcome the distcp failure by listing all the source dir and run the disctp from scratch), in the new versions distcp will not overcomes such issue when distcp fail or interrupted during the run, and will fail all time in the next runs.
2- To overcome this you have to use the snapshot restore and restore the your destination hdfs folder to the state before the distcp failure.
3- The distcp snapshot command should be like this: hadoop disctp -rdiff s1 s0 source_folder destination_folder, and here the s1 is a snapshot at the destination and newer than s0, what will happen after the success of the distcp -rdiff that the destination will be restore to s0 which is the state before the distcp failure.
4- The most challenging thing will be how to manage the snaphot cycle during the distcp diff and distcp rdiff.
example how i'm doing this and working to enhance it
========================================
#!/bin/bash -x
hdfs dfs -createSnapshot /fawzesource s1
hadoop distcp -diff s0 s1 /fawzesource /fawzedestination
if [ $? -eq 0 ]
then
hdfs dfs -createSnapshot /fawzedestination s1
hdfs dfs -deleteSnapshot /fawzesource s0
hdfs dfs -renameSnapshot //fawzesource s1 s0
hdfs dfs -deleteSnapshot //fawzedestination s0
hdfs dfs -renameSnapshot //fawzedestination s1 s0
else
hdfs dfs -deleteSnapshot //fawzedestination s2
hdfs dfs -createSnapshot //fawzedestination s2
hadoop distcp -rdiff s2 s0 //fawzesource //fawzedestination
if [ $? -eq 0 ]
then
hdfs dfs -deleteSnapshot //fawzedestination s2
hdfs dfs -deleteSnapshot //fawzedestination s0
hdfs dfs -createSnapshot //fawzedestination s0
fi
fi