Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Speeding up deployment of app binaries

avatar
Contributor

Are there any recommendations to speed up deployment of app binaries to YARN?

 

I've been using RM REST APIs to submit apps to it with binaries located on HDFS. This tends to take a lot of time when the size of binaries to be deployed as YARN app are big in size (say, >500MB or more), and also when number of containers that I need are high.

 

I could probably speed this up by :

 

1. Turning off default 3 copies needed on HDFS

2. Using HDFS cluster-wide cache which can help avoid block reads

3. Using YARN resource localization

 

Do you have any recommendations which are definitely known to speed this up? 

 

Thanks,

Sumit

1 ACCEPTED SOLUTION

avatar
Mentor
Do the binaries change job-to-job, or are they static? If they do not change, you could benefit from the new feature of proper-shared resource caches: https://issues.apache.org/jira/browse/YARN-1492

Otherwise, compression is your answer. The APIs allow you to add "archives" (zips) which will automatically be unwrapped at the attempt's root, when localising.

Have you also profiled which part is worrying to you more? Is it the initial upload-to-HDFS period, or the localising (per NM) period instead?

View solution in original post

2 REPLIES 2

avatar
Contributor

I also think we can probably compress the binaries before being copied to HDFS and have YARN uncompress them somehow?

avatar
Mentor
Do the binaries change job-to-job, or are they static? If they do not change, you could benefit from the new feature of proper-shared resource caches: https://issues.apache.org/jira/browse/YARN-1492

Otherwise, compression is your answer. The APIs allow you to add "archives" (zips) which will automatically be unwrapped at the attempt's root, when localising.

Have you also profiled which part is worrying to you more? Is it the initial upload-to-HDFS period, or the localising (per NM) period instead?