Migrating HDFS data in S3

Currently i have 1 TB of data in HDFS where i am trying to migrate into S3, i am using below command, when ever i run this command job runs very fast  for 3 hours then it slows down for a week still it is running, i started last week to run this job still it is running and very slow, is this expected behavior.  

nohup hadoop distcp -Dfs.s3a.access.key="$AWS_ACCESS_KEY_ID" -Dfs.s3a.secret.key="$AWS_SECRET_ACCESS_KEY" -Dfs.s3a.multipart.size=10485760 -Dfs.s3a.multipart.threshold=10485760 -m=300 -bandwidth 400 -update hdfs:<....> s3a://<.......>


You may want to collect yarn application log to understand what happened after 3 hours, for example, it may be a yarn resource issue or stuck containers.

1. Open console debug log and re-run distcp and save the output


nohup hadoop distcp -Dfs.s3a.access.key="$AWS_ACCESS_KEY_ID" -Dfs.s3a.secret.key="$AWS_SECRET_ACCESS_KEY" -Dfs.s3a.multipart.size=10485760 -Dfs.s3a.multipart.threshold=10485760 -m=300 -bandwidth 400 -update [hdfs path] [s3a path] > distcp_console.out 2>&1 &

2. Collect yarn application logs:

yarn logs -applicationId [applicationID] > /tmp/distcp_application.out

3. If there are stuck yarn containers, collect jstack of the container pid, refer to below post


@willx i really appreciate for response, looks like i don't have an access to the Article.

Can you please share the solution i really appreciate for help. 



@VidyaSargur i dont have an access to the article waiting for share solution. 

