Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Distcp between two S3 clusters?

avatar
Contributor

Does distcp between two s3 clusters work? If yes, is it same as regular DistCp or how can it be achieved?

1 ACCEPTED SOLUTION

avatar
Guru

Hi @Rajesh Reddy

I've tried it from S3 to a regular HDFS cluster, but I don't see why it wouldn't work. Instead of the hdfs:// prefix for the source/destinations, you can replace that with s3a:// If your S3a keys are defined elsewhere already, they don't necessarily need to be passed in-line like this example:

hadoop distcp -Dfs.s3a.access.key="<key>" -Dfs.s3a.secret.key="<my Key>" s3a://hostname/dir/file s3a://hostname/dir/file

View solution in original post

1 REPLY 1

avatar
Guru

Hi @Rajesh Reddy

I've tried it from S3 to a regular HDFS cluster, but I don't see why it wouldn't work. Instead of the hdfs:// prefix for the source/destinations, you can replace that with s3a:// If your S3a keys are defined elsewhere already, they don't necessarily need to be passed in-line like this example:

hadoop distcp -Dfs.s3a.access.key="<key>" -Dfs.s3a.secret.key="<my Key>" s3a://hostname/dir/file s3a://hostname/dir/file