Reply
Explorer
Posts: 22
Registered: ‎04-12-2016

Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi,

 

We applied DistCp snapshot, and if the application was killed or failed for any reason, the destination has the distcp tmp file and the next run recognise that the destination is modififed and list all the files in the snapshottable path.

Cloudera Employee
Posts: 55
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi, Fawzea. By DistCp snapshot, do you mean using discp to backup some files as a way of snapshotting? I don't recall distCp provides a concept of snapshot iteself. Also, as for your question, can you be more specific about what do you think is supposed to happen and what happened that you think is wrong? 

Cloudera Employee
Posts: 13
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi Fawzea, one workaround will be restoring the snapshot in the target directory. My colleague Yongjun is working on a more elegent solution. He might give a better insight later.

Explorer
Posts: 22
Registered: ‎04-12-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

[ Edited ]

Hi, We used the DistCP to back up our HDFS active farm to the DR farm.

what i do: created S0 in the SRC_SITE and ran distcp then created So in the dest_site (manually do this step).

then i wrote the below scheduled crontab  :

hdfs dfs -createSnapshot hdfs://${SRC_SITE} s1

hadoop distcp -update -p -diff s0 s1 hdfs://${SRC_SITE} hdfs://${DEST_SITE}

if [ $? -eq 0 ] then

hdfs dfs -createSnapshot hdfs://${DEST_SITE} s1

hdfs dfs -deleteSnapshot hdfs://${SRC_SITE} s0

hdfs dfs -renameSnapshot hdfs://${SRC_SITE} s0

hdfs dfs -deleteSnapshot hdfs://${DEST_SITE} s0

hdfs dfs -renameSnapshot hdfs://${DEST_SITE} s1 s0

fi

rm -fv ${LOCKFILE}

The problem when DistCp killed, the dest_site has the distcp.tmp file and in the next run, the distcp recognises that dest_site modified and it ran a S0 like snapshot. I'm using renameSnapshot and deleteSnapshot in order to avoid having many snapshots.

at which step i can restore the S0 to the previous one before the failure?

Cloudera Employee
Posts: 13
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi Fawzea,

The DEST_SITE should restore to s0 before next round of distcp. To do that, you can just delete the tmp files if any, or use the snapshot rollback funtion. But be careful that snapshot rollback function might be not available at this point.

Explorer
Posts: 22
Registered: ‎04-12-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

No, deleting the tmp files will not help since tmp files meta data stored at the Snapshot at the destination , BTW, this was tested to delete the tmp files but didn't help.

 

When snapshot rollback will be available?

Cloudera Employee
Posts: 13
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Are you sure you have delete everything? It's OK that file meta data
changed. As long as snapshot diff report is empty, the distcp copies
only incremental changes. Or to make sure of that, after deleting tmp
files, you can create a new snapshot named s0 on DEST.

 

I don't know the ETA of snapshot rollback. Yongjun is actively working in a
similar solution. Hopefully he can get it done soon.

Explorer
Posts: 22
Registered: ‎04-12-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Yes, I'm sure we are deleting the tmp files but since some mappers were finished the dest_site really was modified.

I modified my crontab as follow:

hdfs dfs -createSnapshot hdfs://${SRC_SITE} s1
hadoop distcp -update -p -diff s0 s1 hdfs://${SRC_SITE} hdfs://${DEST_SITE}
if [ $? -eq 0 ] then
hdfs dfs -createSnapshot hdfs://${DEST_SITE} s1
hdfs dfs -deleteSnapshot hdfs://${SRC_SITE} s0
hdfs dfs -renameSnapshot hdfs://${SRC_SITE} s0
hdfs dfs -deleteSnapshot hdfs://${DEST_SITE} s0
hdfs dfs -renameSnapshot hdfs://${DEST_SITE} s1 s0
else
hdfs dfs -rm -skipTrash hdfs://${DEST_SITE}/.distcp.tmp.*
hdfs dfs -deleteSnapshot hdfs://${DEST_SITE} s0
hdfs dfs -createSnapshot hdfs://${DEST_SITE} s0
fi
rm -fv ${LOCKFILE}

But i still testing it so we can ensure we didn't fall in corner cases that we may lost Data.

Cloudera Employee
Posts: 13
Registered: ‎03-07-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Sure. Bear in mind that this might be a dangerous way, since it is not tested at all. You can consider it as the temporary workaround. To my knowledage, it's safer to use the snapshot rollback.

Explorer
Posts: 22
Registered: ‎04-12-2016

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Will wait for now to snapshot rollback, is it mean that upgrading to cloudera 5.7 will not solve our problem?

 

That mean DistCp snapshot copy for disaster recovery isn't fully mature, so sad to hear that.

Announcements