Hello,
We are tring to use hadoop distcp command to transfer data from on-prem hadoop cluster to GCS by using a private endpoint.
We have tested many different method, and finally we added directly the following lines into /etc/hosts. (Other GCP auth info has been added into core-site.xml.)
XX.XX.XX.XX storage.googleapis.com
XX.XX.XX.XX googleapis.com
*XX.XX.XX.XX is the IP of our private endpoint
hadoop fs -ls and -cp command can list or copy object correctly to the GCS bucket.
But with hadoop distcp command, its mapreduce jobs always go to public endpoint...
Does anyone knows how to make the distcp working with GCS private endpoint?
Or does anyone have any idea about the private endpoint of distcp trying to reach (so that I can add it into /etc/hosts)?
Thank you.