I have two cases :
Case A : i have a kerberised production cluster i want to copy files from this linux box to gcp storage ( non kerberised).
Currently performing manully : downloading linux files to local system using winscp and uploading to google cloud storage
if can be done using Distcp provinding steps will be helpful
Case B : from the same kerberised cluster want to copy data of HDFS to Google cloud storage( non kerberised)
Hello @syedshakir ,
Please let us know what is your cdh version?
If I'm understanding correctly you have a kerberized cluster and the file is at local not on hdfs, so you don't need kerberos authentication. Just refer to below google docs, there are a few ways to do it:
To be honest I never did it so I would try:
1. follow the below document to configure google cloud storage with hadoop:
2. if distcp cannot work then follow this document to configure some properties:
3. save the whole output of distcp then upload to here, I can help you to check. Remember to remove the sensitive information (such as hostname, ip) from the logs then you can upload.
If the distcp output doesn't contain kerberos related errors then you can enable debug logs then re-run the distcp job and save the new output with debug logs:
export HADOOP_ROOT_LOGGER=hadoop.root.logger=Debug,console;export HADOOP_OPTS="-Dsun.security.krb5.debug=true"