10-02-2014 06:18 AM
I am trying to figure out why we see swapping in our cluster.
We are running a CDH 4.4 cluster and recently enabled the cgroups setting. After this we began to see swapping warnings on our hosts, typically between 100 and 5000 pages swapped per 15 minutes, more often towards the lower end of this range.
However, our machines are fairly powerful in all respects, and memory-wise, they all have 136 Gb of RAM and I can see in the Host memory usage graphs that the amount of used memory practically never go above 20 Gb. Most of the RAM is used for file caches (physical_memory_cached, ~100 Gb). This seems to leave about 10-15 Gb of free memory which can be used.
When running top on a node I can see that the amount of free memory varies rapidly between these 15-15 Gb down to 3-400 Mb.
However, as I understand it all the cached memory should be available for processes that request them so there should always be lots of memory here.
According to this post (https://groups.google.com/a/cloudera.org/forum/#!msg/cdh-user/TcxxzhoO3o0/6UtugxglYZYJ) in some cases swapping may occur even though there is plenty of cached memory to use. I don't know if this is our reason.
Since the default value of the swapping warning in cloudera manager is zero, I guess you (cloudera) believes that normally there shouldn't be any swapping going on. Therefore I want to strive for this on our cluster.
I'd be thankful for any insights or tips on how to understand and proceed with this situtation.