Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Explorer

Hi,

 

We applied DistCp snapshot, and if the application was killed or failed for any reason, the destination has the distcp tmp file and the next run recognise that the destination is modififed and list all the files in the snapshottable path.

60 REPLIES 60

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Rising Star

Hi, Fawzea. By DistCp snapshot, do you mean using discp to backup some files as a way of snapshotting? I don't recall distCp provides a concept of snapshot iteself. Also, as for your question, can you be more specific about what do you think is supposed to happen and what happened that you think is wrong? 

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Cloudera Employee

Hi Fawzea, one workaround will be restoring the snapshot in the target directory. My colleague Yongjun is working on a more elegent solution. He might give a better insight later.

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Explorer

Hi, We used the DistCP to back up our HDFS active farm to the DR farm.

what i do: created S0 in the SRC_SITE and ran distcp then created So in the dest_site (manually do this step).

then i wrote the below scheduled crontab  :

hdfs dfs -createSnapshot hdfs://${SRC_SITE} s1

hadoop distcp -update -p -diff s0 s1 hdfs://${SRC_SITE} hdfs://${DEST_SITE}

if [ $? -eq 0 ] then

hdfs dfs -createSnapshot hdfs://${DEST_SITE} s1

hdfs dfs -deleteSnapshot hdfs://${SRC_SITE} s0

hdfs dfs -renameSnapshot hdfs://${SRC_SITE} s0

hdfs dfs -deleteSnapshot hdfs://${DEST_SITE} s0

hdfs dfs -renameSnapshot hdfs://${DEST_SITE} s1 s0

fi

rm -fv ${LOCKFILE}

The problem when DistCp killed, the dest_site has the distcp.tmp file and in the next run, the distcp recognises that dest_site modified and it ran a S0 like snapshot. I'm using renameSnapshot and deleteSnapshot in order to avoid having many snapshots.

at which step i can restore the S0 to the previous one before the failure?

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Cloudera Employee

Hi Fawzea,

The DEST_SITE should restore to s0 before next round of distcp. To do that, you can just delete the tmp files if any, or use the snapshot rollback funtion. But be careful that snapshot rollback function might be not available at this point.

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Explorer

No, deleting the tmp files will not help since tmp files meta data stored at the Snapshot at the destination , BTW, this was tested to delete the tmp files but didn't help.

 

When snapshot rollback will be available?

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Cloudera Employee

Are you sure you have delete everything? It's OK that file meta data
changed. As long as snapshot diff report is empty, the distcp copies
only incremental changes. Or to make sure of that, after deleting tmp
files, you can create a new snapshot named s0 on DEST.

 

I don't know the ETA of snapshot rollback. Yongjun is actively working in a
similar solution. Hopefully he can get it done soon.

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Explorer
Yes, I'm sure we are deleting the tmp files but since some mappers were finished the dest_site really was modified.

I modified my crontab as follow:

hdfs dfs -createSnapshot hdfs://${SRC_SITE} s1
hadoop distcp -update -p -diff s0 s1 hdfs://${SRC_SITE} hdfs://${DEST_SITE}
if [ $? -eq 0 ] then
hdfs dfs -createSnapshot hdfs://${DEST_SITE} s1
hdfs dfs -deleteSnapshot hdfs://${SRC_SITE} s0
hdfs dfs -renameSnapshot hdfs://${SRC_SITE} s0
hdfs dfs -deleteSnapshot hdfs://${DEST_SITE} s0
hdfs dfs -renameSnapshot hdfs://${DEST_SITE} s1 s0
else
hdfs dfs -rm -skipTrash hdfs://${DEST_SITE}/.distcp.tmp.*
hdfs dfs -deleteSnapshot hdfs://${DEST_SITE} s0
hdfs dfs -createSnapshot hdfs://${DEST_SITE} s0
fi
rm -fv ${LOCKFILE}

But i still testing it so we can ensure we didn't fall in corner cases that we may lost Data.

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Cloudera Employee

Sure. Bear in mind that this might be a dangerous way, since it is not tested at all. You can consider it as the temporary workaround. To my knowledage, it's safer to use the snapshot rollback.

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Explorer

Will wait for now to snapshot rollback, is it mean that upgrading to cloudera 5.7 will not solve our problem?

 

That mean DistCp snapshot copy for disaster recovery isn't fully mature, so sad to hear that.