Reply
Highlighted
Explorer
Posts: 62
Registered: ‎01-22-2014

Spark Resource Management using YARN

[ Edited ]

Hi,

 

As I understand, in  the Cloudera Manager, in the STATIC SERVICE POOLS option, it is possible to control the amount of resources allocated to each Application in YARN (HDFS/MR/Impala/HBase etc).

 

For Spark  Application List , I was not able to configure the maximum amount of RAM allocated. But controlling CPU/container memory was possible.

 

In such a case, how is it possible to manage the resources allocated to Spark in a cluster using YARN to manage other applications?Say for example, I want to cache a 2 GB file in RAM but within the allocated Resources as per YARN, how can I do that and also be sure that other applications/users are not impacted?

 

Please let me know.

 

I am using Cloudera Manager Version 5.1.2 and CDH Version 5.1.2

 

Thanks,

Arun

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark Resource Management using YARN

Do you just want to change how much memory Spark asks for? those are Spark options, really. Look at options like --executor-memory on spark-submit. Do you want to enforce limits on how much Spark is allowed to ever ask for? Yes, use pools. See http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Managing-... Are you not able to select memory limits there?

Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Spark Resource Management using YARN

Hi,

 

My question is using --executor-memory, my job asks for x amount of memory.

 

But YARN should make sure that there is an upper limit to the amount of Memory allocated to Spark. This option (of specifying max RAM for Spark) was the one I was not able to find for Spark. I was able to Specify only CPU and Java Heap Space for Spark using Static Service pools.

 

So is there any option to specify this?

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: Spark Resource Management using YARN

I may have to defer to others here who are more expert in YARN, but, when I try to create a static pool and assign percentages, it appears to be configuring heap sizes, including Spark workers:

 

Total Java Heap Sizes of Worker's Executors in Bytes # Hosts Value Subtotal  

spark: Worker 5 Hosts
≈ 4.57 GiB
32 GiB
≈ 22.83 GiB 

 

 

But this is allocating resources for Spark standalone mode and you're interested in allocating resources within YARN, right? I think you need queues for that and here I profess I don't know how this is configured. If that's what you're after, maybe the YARN forum here is worth a look?

Explorer
Posts: 62
Registered: ‎01-22-2014

Re: Spark Resource Management using YARN

 

 If you see YARN was able to configure Java Heap Size but this is not the total RAM allocated to a Spark Worker node right? How to we control the total RAM allocated to a Spark Worker.

 

I am refering to YARN mode only.