Reply
Highlighted
Contributor
Posts: 35
Registered: ‎07-27-2015
Accepted Solution

Speeding up deployment of app binaries

Are there any recommendations to speed up deployment of app binaries to YARN?

 

I've been using RM REST APIs to submit apps to it with binaries located on HDFS. This tends to take a lot of time when the size of binaries to be deployed as YARN app are big in size (say, >500MB or more), and also when number of containers that I need are high.

 

I could probably speed this up by :

 

1. Turning off default 3 copies needed on HDFS

2. Using HDFS cluster-wide cache which can help avoid block reads

3. Using YARN resource localization

 

Do you have any recommendations which are definitely known to speed this up? 

 

Thanks,

Sumit

Contributor
Posts: 35
Registered: ‎07-27-2015

Re: Speeding up deployment of app binaries

I also think we can probably compress the binaries before being copied to HDFS and have YARN uncompress them somehow?

Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Speeding up deployment of app binaries

Do the binaries change job-to-job, or are they static? If they do not change, you could benefit from the new feature of proper-shared resource caches: https://issues.apache.org/jira/browse/YARN-1492

Otherwise, compression is your answer. The APIs allow you to add "archives" (zips) which will automatically be unwrapped at the attempt's root, when localising.

Have you also profiled which part is worrying to you more? Is it the initial upload-to-HDFS period, or the localising (per NM) period instead?
Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.