Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Copy HDFS data to GCP

avatar
Contributor

Hi All,

 I have two cases :

Case A : i have a kerberised production cluster  i want to copy files from this linux box  to gcp storage ( non kerberised).

Currently performing manully : downloading linux files to local system using winscp and uploading to google cloud storage

if can be done using Distcp provinding steps will be helpful

 

Case B :  from the same  kerberised cluster want to copy data of HDFS to Google cloud storage( non kerberised)

 

Thanks,

Syed.

 

 

 

1 REPLY 1

avatar
Master Collaborator

Hello @syedshakir ,

Please let us know what is your cdh version?

 

Case A:

If I'm understanding correctly you have a kerberized cluster and the file is at local not on hdfs, so you don't need kerberos authentication. Just refer to below google docs, there are a few ways to do it:

https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli

 

Case B:

To be honest I never did it so I would try:

1. follow the below document to configure google cloud storage with hadoop:

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html

2. if distcp cannot work then follow this document to configure some properties:

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.htm...

3. save the whole output of distcp then upload to here, I can help you to check. Remember to remove the sensitive information (such as hostname, ip) from the logs then you can upload.

If the distcp output doesn't contain kerberos related errors then you can enable debug logs then re-run the distcp job and save the new output with debug logs:

export HADOOP_ROOT_LOGGER=hadoop.root.logger=Debug,console;export HADOOP_OPTS="-Dsun.security.krb5.debug=true"

 

Thanks,

Will