<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99956#M13004</link>
    <description>&lt;P&gt;I used hadoop distcp as given below: &lt;/P&gt;&lt;P&gt;hadoop distcp hdfs://hdfs_host:hdfs_port/hdfs_path/hdfs_file.txt s3n://s3_aws_access_key_id:s3_aws_access_key_secret@my_bucketname/&lt;/P&gt;&lt;P&gt;My Hadoop cluster is behind the company http proxy server,  I can't figure out how to specify this when connecting to s3. The error I get is: ERROR tools.DistCp: Invalid arguments: org.apache.http.conn.ConnectTimeoutException: Connect to my_bucketname.s3.amazonaws.com:443 timed out.&lt;/P&gt;</description>
    <pubDate>Fri, 18 Dec 2015 11:02:05 GMT</pubDate>
    <dc:creator>ts_venu</dc:creator>
    <dc:date>2015-12-18T11:02:05Z</dc:date>
    <item>
      <title>How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99956#M13004</link>
      <description>&lt;P&gt;I used hadoop distcp as given below: &lt;/P&gt;&lt;P&gt;hadoop distcp hdfs://hdfs_host:hdfs_port/hdfs_path/hdfs_file.txt s3n://s3_aws_access_key_id:s3_aws_access_key_secret@my_bucketname/&lt;/P&gt;&lt;P&gt;My Hadoop cluster is behind the company http proxy server,  I can't figure out how to specify this when connecting to s3. The error I get is: ERROR tools.DistCp: Invalid arguments: org.apache.http.conn.ConnectTimeoutException: Connect to my_bucketname.s3.amazonaws.com:443 timed out.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2015 11:02:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99956#M13004</guid>
      <dc:creator>ts_venu</dc:creator>
      <dc:date>2015-12-18T11:02:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99957#M13005</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/1590/venushanmukappa.html" nodeid="1590"&gt;@Venu Shanmukappa&lt;/A&gt;&lt;P&gt;443 timed out &lt;A target="_blank" href="http://stackoverflow.com/questions/25754530/running-aws-java-sdk-code-without-public-ip"&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;We have to have connectivity to s3. See if &lt;A target="_blank" href="http://www.cyberciti.biz/faq/linux-unix-set-proxy-environment-variable/"&gt;this&lt;/A&gt; helps &lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2015 21:36:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99957#M13005</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-12-18T21:36:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99958#M13006</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/1590/venushanmukappa.html" nodeid="1590"&gt;@Venu Shanmukapp&lt;/A&gt;&lt;P&gt; I'm glad you're utilizing HCC. Let us know if Neeraj's link helps and mark as best answer if it does. &lt;A rel="user" href="https://community.cloudera.com/users/325/azeltov.html" nodeid="325"&gt;@azeltov&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Dec 2015 04:07:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99958#M13006</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2015-12-19T04:07:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99959#M13007</link>
      <description>&lt;P&gt;if you use the s3a:// client, then you can set fs.s3a.proxy settings (host, port, username, password, domain, workstation) to get through.&lt;/P&gt;&lt;P&gt;See &lt;A href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html"&gt;https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Dec 2015 04:27:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99959#M13007</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2015-12-19T04:27:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99960#M13008</link>
      <description>&lt;P&gt;it won't; java doesn't look at the OS proxy settings. (there's a couple of exceptions, but they don't usually surface in a world where applets are disabled)&lt;/P&gt;</description>
      <pubDate>Sat, 19 Dec 2015 04:28:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99960#M13008</guid>
      <dc:creator>stevel</dc:creator>
      <dc:date>2015-12-19T04:28:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99961#M13009</link>
      <description>&lt;P&gt;Thanks all for your replies... &lt;/P&gt;&lt;P&gt;
After adding fs.s3a.proxy.port &amp;amp; fs.s3a.proxy.host to the core-site.xml as Suggested by stevel, I am able to move HDFS files directly 
to aws s3 using s3a:// URI scheme form distcp tool.&lt;/P&gt;</description>
      <pubDate>Sat, 20 Feb 2016 03:21:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99961#M13009</guid>
      <dc:creator>ts_venu</dc:creator>
      <dc:date>2016-02-20T03:21:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99962#M13010</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/1590/venushanmukappa.html" nodeid="1590"&gt;@Venu Shanmukappa&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can also use Hadoop  'cp' command after following the below steps :&lt;/P&gt;&lt;P&gt;1)Configure the core-site.xml file with following aws property :&lt;/P&gt;&lt;P&gt;&amp;lt;property&amp;gt; &lt;/P&gt;&lt;P&gt;   &amp;lt;name&amp;gt;fs.s3n.awsAccessKeyId&amp;lt;/name&amp;gt; &lt;/P&gt;&lt;P&gt;   &amp;lt;value&amp;gt;AWS access key ID. Omit for Role-based authentication.&amp;lt;/value&amp;gt;&lt;/P&gt;&lt;P&gt;
&amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;
&amp;lt;property&amp;gt;&lt;/P&gt;&lt;P&gt;
   &amp;lt;name&amp;gt;fs.s3n.awsSecretAccessKey&amp;lt;/name&amp;gt; &lt;/P&gt;&lt;P&gt;   &amp;lt;value&amp;gt;WS secret key. Omit for Role-based authentication.&amp;lt;/value&amp;gt; &lt;/P&gt;&lt;P&gt;&amp;lt;/property&amp;gt;&lt;/P&gt;&lt;P&gt;2) Export the JAR (aws-java-sdk-1.7.4.jar ) file provided by AWS in environment variable HADOOP_CLASSPATH using below command. &lt;/P&gt;&lt;P&gt;$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*&lt;/P&gt;&lt;P&gt;3)The hadoop "cp" command will copy source data (Local Hdfs) to Destination (AWS S3 bucket) .&lt;/P&gt;&lt;P&gt;$ hadoop fs -cp /user/ubuntu/filename.txt s3n://S3-Bucket-Name/filename.txt&lt;/P&gt;</description>
      <pubDate>Sun, 09 Jul 2017 01:21:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99962#M13010</guid>
      <dc:creator>clouderabhi</dc:creator>
      <dc:date>2017-07-09T01:21:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99963#M13011</link>
      <description>&lt;P&gt;cud u pls explain this in detail&lt;/P&gt;</description>
      <pubDate>Fri, 09 Nov 2018 14:22:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99963#M13011</guid>
      <dc:creator>komathigv</dc:creator>
      <dc:date>2018-11-09T14:22:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to copy HDFS file to AWS S3 Bucket?  hadoop distcp is not working.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99964#M13012</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1590/venushanmukappa.html" nodeid="1590"&gt;@Venu Shanmukappa&lt;/A&gt; how did u add the proxy.. can u pls explain&lt;/P&gt;</description>
      <pubDate>Fri, 09 Nov 2018 14:23:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-copy-HDFS-file-to-AWS-S3-Bucket-hadoop-distcp-is-not/m-p/99964#M13012</guid>
      <dc:creator>komathigv</dc:creator>
      <dc:date>2018-11-09T14:23:21Z</dc:date>
    </item>
  </channel>
</rss>

