Support Questions
Find answers, ask questions, and share your expertise

Require version of distcp that supports the -direct copy option

Expert Contributor

We have CDH6 clusters running in Google Cloud that we backup to GCS using distcp.  The default behavior of first creating a tmp file on cloud storage before "promoting" to the final name seems to be causing a lot of overhead, resulting in rate limit errors on the GCS side.  The -direct copy option offered by newer versions of distcp appears to address this.   Is it possible to upgrade the version of distcp distributed with Hadoop on CDH 6.3?

; ;