Reply
Contributor
Posts: 27
Registered: ‎07-20-2016

Different resource(CPU, MEMORY) for tasks of same job

Yarn allocates same cpu and memory resources for all tasks of a job.

 

We have a use case where small task will take 500MB memory, bigger task will take 12GB, is there any way to achive different resources for different tasks of same job?

Master
Posts: 368
Registered: ‎07-01-2015

Re: Different resource(CPU, MEMORY) for tasks of same job

If you are referring to MR jobs, I think the answer is no, the mam and reduce memory are fixed for the whole job.
Contributor
Posts: 27
Registered: ‎07-20-2016

Re: Different resource(CPU, MEMORY) for tasks of same job

Thanks @Tomas79

 

Yes, for Map Reduce jobs. How about Apache Mesos, is it possible with Mesos to have per task resources?

 

Posts: 1,760
Kudos: 378
Solutions: 281
Registered: ‎07-31-2013

Re: Different resource(CPU, MEMORY) for tasks of same job

You can have unique per container resources under a YARN app. However,
neither Spark or MR applications over YARN provide such a feature as
there's not been a great need for this. Spark favours use of threaded tasks
inside a larger memory container for example, and MR keeps it simple with
unified requests.

Is dividing your compute in near equal parts not possible? That'd give you
higher parallelism.
Contributor
Posts: 27
Registered: ‎07-20-2016

Re: Different resource(CPU, MEMORY) for tasks of same job

Thanks @Harsh J

 

Yes, its not possible to parallelize larger task, a business need.

 

right now the tasks are taking 500MB to 12GB for a job, so we are allocating allocating 12GB per each task, so we are not able to utilize the cluster effectively, we are using DRF, memory is dominant resource. All three fair scheduler policies are memory based, is there any custom CPU only based policy?

 

or 

 

Is it possible to hack the Yarn code/rewrite the scheduler etc.., is this a possibility? we have knowledge (can be databse table) which task will take how much memory.

Posts: 1,760
Kudos: 378
Solutions: 281
Registered: ‎07-31-2013

Re: Different resource(CPU, MEMORY) for tasks of same job

> Is it possible to hack the Yarn code/rewrite the scheduler etc.., is this a possibility? we have knowledge (can be databse table) which task will take how much memory.

Certainly possible, since each request made to RM is treated uniquely.

A scheduler/server-side change can't help you here - YARN is relatively straight-forward in its functions of scheduling and nothing beyond. Each request is independently handled.

The changes you're looking at would need to be made on the application end (by application I mean the YARN app, such as the MR frameworks or the Spark frameworks).

The MR2 Application Master code is fairly involved and you may need to spend some time fully figuring out everything it handles, but this is the point where resource requests are created for tasks based on their type/ID: https://github.com/cloudera/hadoop-common/blob/cdh5.15.0-release/hadoop-mapreduce-project/hadoop-map...

You can consider forking this into your own custom YARN app, or propose a viable enhancement upstream for this.

Wouldn't recommend it though - could become a maintenance burden over time.

Are you seeing a measurable impact of the higher memory requests in terms of concurrency? Since the smaller data sizes (500 MiB in your example) require lower memory, they should be completing quicker too - perhaps that helps compensate the higher requests?

Alternatively you can consider breaking the jobs into their own granular sets, each with a different threshold of memory requests.
Contributor
Posts: 27
Registered: ‎07-20-2016

Re: Different resource(CPU, MEMORY) for tasks of same job

>>> Are you seeing a measurable impact of the higher memory requests in terms of concurrency? Since the smaller data sizes (500 MiB in your example) require lower memory, they should be completing quicker too - perhaps that helps compensate the higher requests?

 

Yes, thought about this, this is what happenning on the cluster, right now we are over committing the memory so that all the vCores are used.

 

I will take a look into mapreduce client side code you pointed.

 

Thanks for the help.

Announcements