according to Hortonworks recommendation ( 2. File System Partitioning Recommendations Setting Up File System Partitions Use the following as a base configuration for all nodes in your cluster: • Root partition: OS and core program files • Swap: Size 2X system memory )
we can see that Hortonworks suggest that swap on machine should be twice from the memory
if we set 32G on masters machines , then swap should be 64G
but actually , from system side this could be risk the performance because swap is twice then memory
lets give here example
when memory resource are ended , then OS use the swap
swap is very slow memory , so ambari cluster will be negative affected when major swap resource will be in used
so from pure system point is very hard to accept that swap should be twice then memory
I will happy to get remarks , or ideas according to swap , and what are the real value of swap against memory if we not want to risk the cluster performance
from Hortonworks doc:
2. File System Partitioning Recommendations Setting Up File System Partitions Use the following as a base configuration for all nodes in your cluster: • Root partition: OS and core program files • Swap: Size 2X system memory
@Michael Bronson For worker/data nodes is not recommended to use swap, as you said "swap is very slow memory , so ambari cluster will be negative affected when major swap resource will be in used" . Please refer to this post for a very good explanation: https://community.hortonworks.com/questions/22548/what-is-the-hortonworks-recommendation-on-swap-usa...
so in that case what is the minimal swap value to use in clusters ? , let say that we have hadoop cluster with masters machines and on each master we have 32G RAM so what is the best practice for swap? ( 5G or 8G or 12G or 15G or else ? )
the practice of using 2x memory for swap space is very old and out of date. It was usefull on a time when systems had as an example 256MB of ram and does not apply as of today. Using a swap space in hadoop nodes, worker or masters is not recommended because it will not prevent you from having issues even when the swap space memory is being used due to RAM hitting the threshold defined with the swapiness parameters.
In the refered post:
"If you have the need to use more memory, or expect to need more, than the amount of RAM which has been purchased. And can accept severe degradation in failure. In this case you would need a lot of swap configured. Your better off buying the right amount of memory."
"The fear with disabling swap on masters is that an OOM (out of memory) event could affect cluster availability. But that will still happen even with swap configured, it just will take slightly longer. Good administrator/operator practices would be to monitor RAM availability, then fix any issues before running out of memory"
If you really have a requirement to have it configured on your master nodes, then just set the swap as you like, example a 1/4 of total system memory and set the swappiness value to 0.
Is not an obligation but if you are using a swap your process will be slower, is very recommend that you disable your swap and THP as follow.
As per the best practices i agree with you 1.5 or 2x is your swap size and when you're working on VM we can use the swappiness parameter to decide the memory usage , for ex when your memory hits threshold at 60 then automatically it will start using swap by default but for physical machines it is less than 2x you can pick for configuration.
Also if we use swap directly it is 200% slower than the physical RAM -always a thumb rule