- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Migrating HDFS data in S3
- Labels:
-
HDFS
Created 03-02-2025 11:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Currently i have 1 TB of data in HDFS where i am trying to migrate into S3, i am using below command, when ever i run this command job runs very fast for 3 hours then it slows down for a week still it is running, i started last week to run this job still it is running and very slow, is this expected behavior.
nohup hadoop distcp -Dfs.s3a.access.key="$AWS_ACCESS_KEY_ID" -Dfs.s3a.secret.key="$AWS_SECRET_ACCESS_KEY" -Dfs.s3a.fast.upload=true -Dfs.s3a.fast.buffer.size=1048576 -Dfs.s3a.multipart.size=10485760 -Dfs.s3a.multipart.threshold=10485760 -Dmapreduce.map.memory.mb=8192 -Dmapreduce.map.java.opts=-Xmx7360m -m=300 -bandwidth 400 -update hdfs:<....> s3a://<.......>
Created 03-04-2025 02:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zubair123, Welcome to our community! To help you get the best possible answer, I have tagged in our HDFS experts @willx @ChethanYM who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 03-04-2025 10:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to collect yarn application log to understand what happened after 3 hours, for example, it may be a yarn resource issue or stuck containers.
1. Open console debug log and re-run distcp and save the output
export HADOOP_ROOT_LOGGER=DEBUG,console
nohup hadoop distcp -Dfs.s3a.access.key="$AWS_ACCESS_KEY_ID" -Dfs.s3a.secret.key="$AWS_SECRET_ACCESS_KEY" -Dfs.s3a.fast.upload=true -Dfs.s3a.fast.buffer.size=1048576 -Dfs.s3a.multipart.size=10485760 -Dfs.s3a.multipart.threshold=10485760 -Dmapreduce.map.memory.mb=8192 -Dmapreduce.map.java.opts=-Xmx7360m -m=300 -bandwidth 400 -update [hdfs path] [s3a path] > distcp_console.out 2>&1 &
2. Collect yarn application logs:
yarn logs -applicationId [applicationID] > /tmp/distcp_application.out
3. If there are stuck yarn containers, collect jstack of the container pid, refer to below post
Created 03-11-2025 10:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@willx i really appreciate for response, looks like i don't have an access to the Article.
Can you please share the solution i really appreciate for help.
Thanks,
Zubair.
Created 03-10-2025 11:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zubair123, Did the response help resolve your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 03-11-2025 10:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@VidyaSargur i dont have an access to the article waiting for share solution.
Created 03-12-2025 10:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zubair123, This article is available exclusively for our customers. If you're a customer, please contact our customer support team for more details. If you’re not, our sales team would happily assist you with any information you need.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
