<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Copy HDFS data to GCP in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Copy-HDFS-data-to-GCP/m-p/349340#M235624</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/69915"&gt;@syedshakir&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Please let us know what is your cdh version?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Case A:&lt;/P&gt;&lt;P&gt;If I'm understanding correctly you have a kerberized cluster and the file is at local not on hdfs, so you don't need kerberos authentication. Just refer to below google docs, there are a few ways to do it:&lt;/P&gt;&lt;P&gt;&lt;A href="https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli" target="_blank"&gt;https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Case B:&lt;/P&gt;&lt;P&gt;To be honest I never did it so I would try:&lt;/P&gt;&lt;P&gt;1. follow the below document to configure google cloud storage with hadoop:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html" target="_blank"&gt;https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2. if distcp cannot work then follow this document to configure some properties:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.html" target="_blank"&gt;https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3. save the whole output of distcp then upload to here, I can help you to check. Remember to remove the sensitive information (such as hostname, ip) from the logs then you can upload.&lt;/P&gt;&lt;P&gt;If the distcp output doesn't contain kerberos related errors then you can enable debug logs then re-run the distcp job and save the new output with debug logs:&lt;/P&gt;&lt;P&gt;export HADOOP_ROOT_LOGGER=hadoop.root.logger=Debug,console;export HADOOP_OPTS="-Dsun.security.krb5.debug=true"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Will&lt;/P&gt;</description>
    <pubDate>Tue, 02 Aug 2022 10:32:46 GMT</pubDate>
    <dc:creator>willx</dc:creator>
    <dc:date>2022-08-02T10:32:46Z</dc:date>
    <item>
      <title>Copy HDFS data to GCP</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Copy-HDFS-data-to-GCP/m-p/348543#M235386</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;I have two cases :&lt;/P&gt;&lt;P&gt;Case A :&amp;nbsp;i have a kerberised production cluster&amp;nbsp; i want to copy files from this linux box&amp;nbsp; to gcp&amp;nbsp;storage&amp;nbsp;( non kerberised).&lt;/P&gt;&lt;P&gt;Currently performing manully : downloading linux files to local system using winscp and uploading to google cloud storage&lt;/P&gt;&lt;P&gt;if can be done using Distcp provinding steps will be helpful&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Case B :&amp;nbsp; from the same&amp;nbsp; kerberised cluster want to copy data of HDFS to Google cloud storage( non kerberised)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Syed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2022 13:05:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Copy-HDFS-data-to-GCP/m-p/348543#M235386</guid>
      <dc:creator>syedshakir</dc:creator>
      <dc:date>2022-07-22T13:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: Copy HDFS data to GCP</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Copy-HDFS-data-to-GCP/m-p/349340#M235624</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/69915"&gt;@syedshakir&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Please let us know what is your cdh version?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Case A:&lt;/P&gt;&lt;P&gt;If I'm understanding correctly you have a kerberized cluster and the file is at local not on hdfs, so you don't need kerberos authentication. Just refer to below google docs, there are a few ways to do it:&lt;/P&gt;&lt;P&gt;&lt;A href="https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli" target="_blank"&gt;https://cloud.google.com/storage/docs/uploading-objects#upload-object-cli&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Case B:&lt;/P&gt;&lt;P&gt;To be honest I never did it so I would try:&lt;/P&gt;&lt;P&gt;1. follow the below document to configure google cloud storage with hadoop:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html" target="_blank"&gt;https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_gcs_config.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2. if distcp cannot work then follow this document to configure some properties:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.html" target="_blank"&gt;https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_admin_distcp_secure_insecure.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3. save the whole output of distcp then upload to here, I can help you to check. Remember to remove the sensitive information (such as hostname, ip) from the logs then you can upload.&lt;/P&gt;&lt;P&gt;If the distcp output doesn't contain kerberos related errors then you can enable debug logs then re-run the distcp job and save the new output with debug logs:&lt;/P&gt;&lt;P&gt;export HADOOP_ROOT_LOGGER=hadoop.root.logger=Debug,console;export HADOOP_OPTS="-Dsun.security.krb5.debug=true"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Will&lt;/P&gt;</description>
      <pubDate>Tue, 02 Aug 2022 10:32:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Copy-HDFS-data-to-GCP/m-p/349340#M235624</guid>
      <dc:creator>willx</dc:creator>
      <dc:date>2022-08-02T10:32:46Z</dc:date>
    </item>
  </channel>
</rss>

