Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hadoop distcp to Google cloud storage Private endpoint

avatar
New Contributor

Hello,

We are tring to use hadoop distcp command to transfer data from on-prem hadoop cluster to GCS by using a private endpoint.

We have tested many different method, and finally we added directly the following lines into /etc/hosts. (Other GCP auth info has been added into core-site.xml.)
XX.XX.XX.XX storage.googleapis.com
XX.XX.XX.XX googleapis.com
*XX.XX.XX.XX is the IP of our private endpoint

hadoop fs -ls and -cp command can list or copy object correctly to the GCS bucket.

But with hadoop distcp command, its mapreduce jobs always go to public endpoint...

Does anyone knows how to make the distcp working with GCS private endpoint?

Or does anyone have any idea about the private endpoint of distcp trying to reach (so that I can add it into /etc/hosts)?

Thank you.

1 REPLY 1

avatar
Community Manager

Welcome to the community @Hz . Perhaps someone like @Stella Tang can point you in the right direction. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.