Created 01-12-2017 08:25 PM
How to install the hadoop-aws module to copy from on premist hdfs to s3 aws, I need the command s3DistCp
Created 01-13-2017 03:46 PM
distcp recognizes the s3[a] protocols from the default libraries already available in Hadoop.
For example: Moving data from Hadoop to S3.
hadoop distcp <current_cluster_folder> s3[a]://<bucket_info>
If you're looking for ways to manage access (via AWS Keys) to S3 Buckets in Hadoop, this article is a great secure way to do that.
Created 01-13-2017 03:46 PM
distcp recognizes the s3[a] protocols from the default libraries already available in Hadoop.
For example: Moving data from Hadoop to S3.
hadoop distcp <current_cluster_folder> s3[a]://<bucket_info>
If you're looking for ways to manage access (via AWS Keys) to S3 Buckets in Hadoop, this article is a great secure way to do that.
Created 01-14-2017 04:34 PM
When I run:
It prompts me for a password:
[root@test232 conf]# hadoop credential create fs.s3a.access.key -provider localjceks://file/var/tmp/aws.jceks
Enter password:
Enter password again:
Created 01-14-2017 04:46 PM
When I add the access key and the secret in the prompt for the password, I get this:
[hdfs@test232 ~]$ hdfs dfs -Dhadoop.security.credential.provider.path=jceks://hdfs/aws/aws.jceks -ls s3a://s3-us-west-2.amazonaws.com/kartik-test 17/01/14 07:51:00 INFO s3a.S3AFileSystem: Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason. 17/01/14 07:51:00 INFO s3a.S3AFileSystem: Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: C3EFA25EC200D255, AWS Error Code: null, AWS Error Message: Forbidden 17/01/14 07:51:00 INFO s3a.S3AFileSystem: HTTP Status Code: 403 17/01/14 07:51:00 INFO s3a.S3AFileSystem: AWS Error Code: null 17/01/14 07:51:00 INFO s3a.S3AFileSystem: Error Type: Client 17/01/14 07:51:00 INFO s3a.S3AFileSystem: Request ID: C3EFA25EC200D255 17/01/14 07:51:00 INFO s3a.S3AFileSystem: Class Name: com.amazonaws.services.s3.model.AmazonS3Exception -ls: Fatal internal error
Created 01-14-2017 07:38 PM
[hdfs@test232 ~]$ curl http://kartik-test.s3-us-west-2.amazonaws.com <?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>kartik-test</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>hosts</Key><LastModified>2017-01-12T19:48:14.000Z</LastModified><ETag>"881dc3861c3c8a28e213790785a940b7"</ETag><Size>44</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>logs/</Key><LastModified>2017-01-14T17:01:56.000Z</LastModified><ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag><Size>0</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>[hdfs@test232 ~]$
Created 01-14-2017 09:02 PM
I tried: hadoop distcp -Dhadoop.security.credential.provider.path=jceks://hdfs/aws.jceks /nsswitch.conf s3a//kartik-test.s3-us-west-2.amazonaws.com
and it created a s3a folder in my hdfs:
[hdfs@test232 ~]$ hdfs dfs -ls
Found 3 items
drwx------ - hdfs hdfs 0 2017-01-14 07:47 .Trash
drwx------ - hdfs hdfs 0 2017-01-14 12:07 .staging
drwx------ - hdfs hdfs 0 2017-01-14 12:07 s3a
[hdfs@test232 ~]$
Created 01-15-2017 12:07 AM
Getting there....I missed a colon in my previous attempt......
[hdfs@test232 ~]$ hadoop distcp -Dhadoop.security.credential.provider.path=jceks://hdfs/aws.jceks /nsswitch.conf s3a://kartik-test.s3-us-west-2.amazonaws.com
17/01/14 15:12:31 INFO s3a.S3AFileSystem: Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
17/01/14 15:12:31 INFO s3a.S3AFileSystem: Error Message: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 3094C5772AA3B4C0, AWS Error Code: SignatureDoesNotMatch, AWS Error Message: The request signature we calculated does not match the signature you provided. Check your key and signing method.
17/01/14 15:12:31 INFO s3a.S3AFileSystem: HTTP Status Code: 403 17/01/14 15:12:31 INFO s3a.S3AFileSystem: AWS Error Code: SignatureDoesNotMatch
17/01/14 15:12:31 INFO s3a.S3AFileSystem: Error Type: Client 17/01/14 15:12:31 INFO s3a.S3AFileSystem: Request ID: 3094C5772AA3B4C0 17/01/14 15:12:31 INFO s3a.S3AFileSystem: Class Name: com.amazonaws.services.s3.model.AmazonS3Exception
Created 01-15-2017 01:16 AM
Regenerated keys, updated the aws.jceks entry
[hdfs@test232 ~]$ hadoop distcp -Dhadoop.security.credential.provider.path=jceks://hdfs/aws/aws.jceks /nsswitch.conf s3a://kartik-test.s3-us-west-2.amazonaws.com 17/01/14 20:14:59 ERROR tools.DistCp: Invalid arguments: java.io.IOException: Bucket kartik-test.s3-us-west-2.amazonaws.com does not exist
But I am able to browse the bucket in http
Created 01-15-2017 03:00 PM
This worked: [hdfs@test232 ~]$ hadoop distcp -Dhadoop.security.credential.provider.path=jceks://hdfs/aws/aws.jceks /test s3a://kartik-test/
Thanks for all your help!!