- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
container exit code 137 and memory usage is limited
- Labels:
-
Apache YARN
Created on ‎05-21-2018 07:41 PM - edited ‎09-16-2022 06:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My cluster memory is 147GB and I get this error when the server has not used it's entire memory.
I can see there is memory free and yet my jobs get killed with this error. There is no error in logs and I don't get any error using dmesg command or in /var/log/messages
Also, it happens randomly and on any of the nodes. Please suggest. Been trying to get in touch with Cloudera sales support but no luck and it's urgent.
Created ‎05-21-2018 08:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For example if you've configured a map task to use 1 GiB of pmem, but its actual code at runtime uses more than 1 GiB, it will get killed. The common resolution to this would be to grant it more than 1 GiB, so it may do its higher-memory work without exceeding what it is given. Another resolution in certain cases would be to investigate if the excess memory use is justified, which can be discussed with the developer of the application.
The randomness may be dependent on the amount of data the container code processes and what it ends up doing with it.
Have you tried increasing the memory properties of containers via fields such as "Map Task Memory", "Reduce Task Memory" if its MR jobs you are having issues with, or pass higher values to --executor-memory arguments with spark-submit if its Spark jobs instead.
This is all assuming you are seeing an error of the below form, since the relevant log isn't shared in your post:
… Container killed by YARN for exceeding memory limits. 1.1 GB of 1 GB physical memory used …
Created on ‎05-21-2018 09:39 PM - edited ‎05-21-2018 09:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These are not spark jobs but hive and sqoop jobs I am running. These randomly get killed throughout the day, with the same configuration sometimes run and sometimes don't.
Created ‎05-24-2018 06:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Harsh J : Could you please respond? It's a production cluster and it is disturbing our workflows when we run into this error
