- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
SPARK job taking more memory then it is given
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
Created 10-30-2017 01:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As my question is saying. Lets say I'm submiting spark-job like this:
spark-submit --class streaming.test --master yarn --deploy-mode cluster --name some_name --executor-memory 512m --executor-cores 1 --driver-memory 512m some.jar
The job is submited and it is running as you can see here:
screenshot-6.jpg
But as you can see, I gave to job 512MB of RAM, YARN gave 3GB and it is happening for every Spark job I'm submitting. Can someone lead me where I'm mistaking?
UPDATE:
I have 3 RMs. and yarn.scheduler.minimum-alocation-mb is set to 1024. Is that because this 1024 *(num of of RM) ?
Created 10-30-2017 03:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 3GB is the total memory across all containers. 4 apps in that screenshot show 3GB because they have 3 running containers. If you see the app in the 3rd row you will see 1 container only and hence 1024mb.
Created 10-30-2017 04:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gour Saha
Is it somehow possible to allocate just 512 because, apps jobs aren't that "expensive" that they need 3-4GBs of RAM?
Thank you 🙂
Created 10-30-2017 04:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, set yarn.scheduler.minimum-allocation-mb to 512mb or less.
Created 10-30-2017 04:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually I did so, kill the app, submit the same app with same config and he took 3GB again. I'll give it a shot again and give u feedback asap
Created 10-30-2017 04:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Make sure you restart all YARN services RMs, NMs after the change
Created 10-30-2017 04:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, yeah I did it of course, it was suggesting.
I tested once more, and the same job is still taking 3GB, this s how my config looks like now screenshot-7.jpg
Created 10-30-2017 04:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I really have feelling like YARN is overriding parameters I'm passing. Also I tried to set --num-executors to 2, he set 3 as you can see on the first picture above
Created 10-30-2017 04:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One container is always the AM (application master), that's why it is 3. Can you click on the application ID in the first row, and then click on the attempt ID link and then on each of the 3 container ID links to see how much memory each container is taking?
Created 10-30-2017 05:02 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I decreased
yarn.scheduler.minimum-alocation-mb to 256MB
Spark submit configs now are following:
--executor-memory 256m --executor-cores 1 --num-executors 1 --driver-memory 512m
I need it to set --driver-memory to 512MB since application wouldn't start. So, with this configs application is taking 2 GB of RAM and as you were asking => Job is as You assume across 2 containers and each is taking 1024MB
UPDATE:
In INFO of Spark job I can see this:
17/10/30 17:57:10 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
