Reply
Expert Contributor
Posts: 181
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

@Yongjun Zhang Can you please reply to my last comment regarding using the diff only with full listing in case of failures.

Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Hi Fawze,

You can choose not to use diff, or you run diff, if it fails, fallback to
another distcp command without diff that does the regular distcp (with
-update -delete).

Would you please also answer my questions in my last comment?

Thanks.
Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Thanks for the clarification. Sorry I missed this reply earlier. Good to know that it's not resulted from distcp. So there is no snapshot opertaion failure message even if it failed?

 

 

 

Expert Contributor
Posts: 181
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

@Yongjun Zhang you last suggestion to issue distcp after the diff failure is making our life more complex since i need to delete 4 snapshots, create new s0 snapshot , issue distcp and then create s0 at destination.

 

I still wondering why the full listing in case of failures was disabled in the new version.

Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

Because -diff requires not to pass -delete, however, the user may or may
not want to have -delete when running without -diff. If you run with -diff,
even if we can fallback to run regular distcp, the software doesn't know
whether user want to do -delete and can not make the decision for user. One
possibility is to add a new switch to enable that.

BTW, do you see error message when snapshot operations failed?

Thanks.
Expert Contributor
Posts: 181
Registered: ‎01-25-2017

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

In both case if user passes -delete or not in the reqular distcp after the fallback, the -diff in the next run will correct the situation.

 

Yes, in case of snapshot error, we are getting the network issue message like connection timeout between node xxx and namenodexx:8020, to manage different errors to each snapshot in one cron is adding more compexity to the snapshot cycle management.

 

More important, such changes that is not backward compaitible should be communicated or mentioned in the release notes or in the rdiff documntation, imagine that i want to upgrade my cluster, and after the upgrade either i will do rollback because the -rdiff ot i need to find a solution and implement it on time.

 

I think there is should be another switch case in the code that gives the user more opportunitites.

 

Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

HI Fawze,

"In both case if user passes -delete or not in the reqular distcp after the
fallback, the -diff in the next run will correct the situation."
I am not sure I follow the above statement.

The fallback feature was disabled in HDFS-10313. I created the following
jira:

https://issues.apache.org/jira/browse/HDFS-11706

to re-enable.

Before we have that jira implemented, I think if you could make your script
to detect the failure, then you can have the script to re-issue a regular
distcp command as a manual fallback.

Thanks.

--Yongjun

Cloudera Employee
Posts: 15
Registered: ‎08-20-2015

Re: Killing the Distcp which running over snapshot listing all snapshottable path in the next run

HI Fawze,

"In both case if user passes -delete or not in the reqular distcp after the
fallback, the -diff in the next run will correct the situation."
I am not sure I follow the above statement.

The fallback feature was disabled in HDFS-10313. I created the following
jira:

https://issues.apache.org/jira/browse/HDFS-11706

to re-enable.

Before we have that jira implemented, I think if you could make your script
to detect the failure, then you can have the script to re-issue a regular
distcp command as a manual fallback.

Thanks.

--Yongjun

Announcements