Posts: 35
Registered: ‎07-27-2015
Accepted Solution

Speeding up deployment of app binaries

Are there any recommendations to speed up deployment of app binaries to YARN?


I've been using RM REST APIs to submit apps to it with binaries located on HDFS. This tends to take a lot of time when the size of binaries to be deployed as YARN app are big in size (say, >500MB or more), and also when number of containers that I need are high.


I could probably speed this up by :


1. Turning off default 3 copies needed on HDFS

2. Using HDFS cluster-wide cache which can help avoid block reads

3. Using YARN resource localization


Do you have any recommendations which are definitely known to speed this up? 




Posts: 35
Registered: ‎07-27-2015

Re: Speeding up deployment of app binaries

I also think we can probably compress the binaries before being copied to HDFS and have YARN uncompress them somehow?

Posts: 1,836
Kudos: 416
Solutions: 295
Registered: ‎07-31-2013

Re: Speeding up deployment of app binaries

Do the binaries change job-to-job, or are they static? If they do not change, you could benefit from the new feature of proper-shared resource caches:

Otherwise, compression is your answer. The APIs allow you to add "archives" (zips) which will automatically be unwrapped at the attempt's root, when localising.

Have you also profiled which part is worrying to you more? Is it the initial upload-to-HDFS period, or the localising (per NM) period instead?