Reply
New Contributor
Posts: 5
Registered: ‎03-09-2017

Unable to move data to a S3 bucket using last CDH (5.14.0)

Hi!

 

I am trying to move data from HDFS to a S3 bucket. I am using last version of CM/CDH (5.14.0). I have been able to copy data using the tool aws:

 

aws s3api put-object

And also with the python SDK but I cannot copy data with hadoop distcp. I have added the following extra properties to core-site.xml in HDFS service.

 

s3a.png

<property>
    <name>fs.s3a.access.key</name>
    <value>X</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>X</value>
</property>
<property>
    <name>fs.s3a.endpoint</name>
    <value>s3.us-east-2.amazonaws.com</value>
</property>

Nothing happens when I execute a command like

hadoop distcp /blablabla s3a://bucket-name/

but it hangs for a while (I guess is trying several times). Same thing when I try to just list files in the bucket with

 

hadoop fs -ls s3a://bucket-name

I am sure it is not a credentials problem since I can connect using the same access and secret key with the python SKD and aws tool.

 

Anyone facing a similar issue? Thanks!

Cloudera Employee
Posts: 3
Registered: ‎09-09-2015

Re: Unable to move data to a S3 bucket using last CDH (5.14.0)

Distcp can take some time to complete depending on your source data.

 

One thing to try would be to list a public bucket.  I believe if you have no credentials set you'll see an error, but if you have any valid credentials you should be able to list it:

 

hadoop fs -ls s3a://landsat-pds/

 

Also make sure you've deployed your client configs in Cloudera Manager (CM).

Announcements