Created 12-18-2015 03:02 AM
I used hadoop distcp as given below:
hadoop distcp hdfs://hdfs_host:hdfs_port/hdfs_path/hdfs_file.txt s3n://s3_aws_access_key_id:s3_aws_access_key_secret@my_bucketname/
My Hadoop cluster is behind the company http proxy server, I can't figure out how to specify this when connecting to s3. The error I get is: ERROR tools.DistCp: Invalid arguments: org.apache.http.conn.ConnectTimeoutException: Connect to my_bucketname.s3.amazonaws.com:443 timed out.
Created 12-18-2015 08:27 PM
if you use the s3a:// client, then you can set fs.s3a.proxy settings (host, port, username, password, domain, workstation) to get through.
See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
Created 12-18-2015 01:36 PM
Created 12-18-2015 08:28 PM
it won't; java doesn't look at the OS proxy settings. (there's a couple of exceptions, but they don't usually surface in a world where applets are disabled)
Created 12-18-2015 08:07 PM
I'm glad you're utilizing HCC. Let us know if Neeraj's link helps and mark as best answer if it does. @azeltov
Created 12-18-2015 08:27 PM
if you use the s3a:// client, then you can set fs.s3a.proxy settings (host, port, username, password, domain, workstation) to get through.
See https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
Created 02-19-2016 07:21 PM
Thanks all for your replies...
After adding fs.s3a.proxy.port & fs.s3a.proxy.host to the core-site.xml as Suggested by stevel, I am able to move HDFS files directly to aws s3 using s3a:// URI scheme form distcp tool.
Created 11-09-2018 06:23 AM
@Venu Shanmukappa how did u add the proxy.. can u pls explain
Created 07-08-2017 06:21 PM
You can also use Hadoop 'cp' command after following the below steps :
1)Configure the core-site.xml file with following aws property :
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>AWS access key ID. Omit for Role-based authentication.</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>WS secret key. Omit for Role-based authentication.</value>
</property>
2) Export the JAR (aws-java-sdk-1.7.4.jar ) file provided by AWS in environment variable HADOOP_CLASSPATH using below command.
$ export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*
3)The hadoop "cp" command will copy source data (Local Hdfs) to Destination (AWS S3 bucket) .
$ hadoop fs -cp /user/ubuntu/filename.txt s3n://S3-Bucket-Name/filename.txt
Created 11-09-2018 06:22 AM
cud u pls explain this in detail