Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Speeding up deployment of app binaries

Solved Go to solution

Speeding up deployment of app binaries

Contributor

Are there any recommendations to speed up deployment of app binaries to YARN?

 

I've been using RM REST APIs to submit apps to it with binaries located on HDFS. This tends to take a lot of time when the size of binaries to be deployed as YARN app are big in size (say, >500MB or more), and also when number of containers that I need are high.

 

I could probably speed this up by :

 

1. Turning off default 3 copies needed on HDFS

2. Using HDFS cluster-wide cache which can help avoid block reads

3. Using YARN resource localization

 

Do you have any recommendations which are definitely known to speed this up? 

 

Thanks,

Sumit

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Speeding up deployment of app binaries

Master Guru
Do the binaries change job-to-job, or are they static? If they do not change, you could benefit from the new feature of proper-shared resource caches: https://issues.apache.org/jira/browse/YARN-1492

Otherwise, compression is your answer. The APIs allow you to add "archives" (zips) which will automatically be unwrapped at the attempt's root, when localising.

Have you also profiled which part is worrying to you more? Is it the initial upload-to-HDFS period, or the localising (per NM) period instead?
2 REPLIES 2

Re: Speeding up deployment of app binaries

Contributor

I also think we can probably compress the binaries before being copied to HDFS and have YARN uncompress them somehow?

Highlighted

Re: Speeding up deployment of app binaries

Master Guru
Do the binaries change job-to-job, or are they static? If they do not change, you could benefit from the new feature of proper-shared resource caches: https://issues.apache.org/jira/browse/YARN-1492

Otherwise, compression is your answer. The APIs allow you to add "archives" (zips) which will automatically be unwrapped at the attempt's root, when localising.

Have you also profiled which part is worrying to you more? Is it the initial upload-to-HDFS period, or the localising (per NM) period instead?