Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Each map task of a distcp job to s3 is running several times

avatar
Contributor

Hello,

I have a cluster with HDP 2.5.0.2 (HDFS version 2.7.1.2.5) and I'm trying to distcp large file (200G) from on-premise cluster to Netapp S3, using fast upload. a map task is launched for each file and each map task reads something like this

Task:
task_1518158406102_3329_m_000000

Status:
(84.5% Copying hdfs://HDP1/user/backup/backup-test/2tb_of_200M_each/test_io_0 to s3a://hadoop-acceptance-tests/user/backup/backup-test/test_io_0 [169.1G/200.0G] > map)

Once the progress reaches 100% (200.0G/200.0G), then it starts again from 0% (0%/200.0G). This repeats for few times.

Here is the command that I used to trigger distcp

hadoop distcp -Dfs.s3a.endpoint=s3.in.myhost:8082 -Dfs.s3a.access.key=XXXXXXXXX -Dfs.s3a.secret.key=YYYYYYYYYYY -Dfs.s3a.signing-algorithm=S3SignerType \
-Dfs.s3a.buffer.dir=<Local_path>/tmp-hadoop_backup \
-Dfs.s3a.fast.upload=true \
-Dfs.s3a.fast.buffer.size=1048576 \
-Dfs.s3a.multipart.size=10485760 \
-Dfs.s3a.multipart.threshold=10485760 \
-Dmapreduce.map.memory.mb=8192 \
-Dmapreduce.map.java.opts=-Xmx7360m \
-log /backup/log/distcp_2tb \
-m=300 -bandwidth 1024 \
-skipcrccheck \
-update hdfs:/user/backup/load-test/2tb_of_200M_each s3a://hadoop-acceptance-tests/user/backup/load-test/

Q1. What is the meaning of running each task several times?

Q2. After running the job for several hours, job fails with not even copying a single file. It worked (copied) for 10G, 50G and 100G file size each.

Q3. Any recommendations/ thumb rules to change/tune the configs based on input file size and upload speeds to s3?

Thanks

Surya

6 REPLIES 6

avatar

avatar

Sounds like something is failing in that 200GB upload. I'd turn off the fs.s3a.fast.upload. In HDP-2.5 its buffering into RAM, and if more data is queued for upload than there's room for in the JVM heap, the JVM will fail...which will trigger the retry. You will also need enough space on the temp disk for the whole file.

In HDP 2.6+ we've added disk buffering for the in-progress uploads, and enable that by default.

avatar
Contributor

Hi @Dominika Bialek and @stevel,

Thank you for your valuable inputs. The following set of configurations worked for me even for 1 TB file size each. It depends on the infrastructure, network bandwidth between nodes and upload speed (bandwidth) from data node to S3 etc. It took several iterations to do stress tests with many files of small file sizes, small number of big files etc and tune fast buffer size, multipart size and thresholds based on my cluster infrastructure speeds.

-Dmapreduce.task.timeout=0 \
-Dfs.s3a.fast.upload=true \
-Dfs.s3a.fast.buffer.size=157286400 \
-Dfs.s3a.multipart.size=314572800 \
-Dfs.s3a.multipart.threshold=1073741824 \
-Dmapreduce.map.memory.mb=8192 \
-Dmapreduce.map.java.opts=-Xmx7290m \
-Dfs.s3a.max.total.tasks=1 \
-Dfs.s3a.threads.max=10 \
-bandwidth 1024

Thanks

Surya

avatar
Contributor

Is it possible that large HDFS file not have to be extracted from HDFS in full to either disk or memory? And uploaded in parts as HDFS blocks are read? Would this take a change to distcp to not be one map per file, but one map per block?

avatar
Moderator

Hi @regeamor,

 

Thank you for reaching out to community!

The DistCp command submits a regular MapReduce job that performs a file-by-file copy. The block locations of the file is obtained from the name-node during MapReduce. On DistCp, each Mapper will be initiated, if possible, on the node where the first block of the file is present. In cases where the file is composed of multiple splits, they will be fetched from the neighbourhood if not available on the same node.

The fs.s3a.fast.upload option significantly accelerates data upload by writing the data in blocks, possibly in parallel.

Please refer on How to improve performance for DistCp  for more details.

 

 


Madhuri Adipudi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

avatar
Contributor
Thanks,

What I am experiencing is that the complete file, if 300GB, has to be assembled before upload to S3. This requires either 300GB of memory or disk. Distcp does not create a part file per block. I have not witnessed any file split being done. Multi part uploads require you get an upload ID and upload many part files with a numeric extension and in the end ask S3 to put them back together. I do not see any of this being done. I admit I do not know much about all this and it could be happening out of my sight.