04-21-2015 11:11 PM
I setup a test environment built from VMs so I can test the process of upgrading from cloudera 4.8 to 5.3. The 4.8 installation was barebones with no configuration changes made by me. It was as cloudera configured it! The upgrade of the cloudera manager went smooth except that all the nodes came up with the Memory Overcommit Validation Threshold configuration error.
I built the VMs with 16GB of ram and with that I was told that I had committed 25GB. I decided that the easy solution was to increase the amount of physical ram to 32GB. When I did that I still got the error but now was being told that I was committed to 50GB of ram.
I was given the ability to change a single variable that could accept a number from 0 to 1. I tried 0 and I tried 1 and several numbers in between but it had no affect that I could see. Am I safe in ignoring this warning/error? I can't open a support case since this test environment is a short term test running on a 60day license and has not support. If I do the upgrade in an actual supported environment will the configuration be stable until I get help from cloudera support??
At this point in time I have no confidence in being able to perform the upgrade!!
04-30-2015 10:51 PM
The overcommit message is a warning that if all the components on the host use all the memory they're allowed to use then the server will exceed its memory and start swapping. Servers do not perform well at this point and processes tend to hang/fail so this should be avoided. The only reason why the assigned memory in the warning would increase to 50gb would be if you assigned more services, or if you upgraded.
Typically in this case you should:
- Assign more memory to the host
- Move services to other hosts, possibly adding new hosts
- Decrease memory assignments to services (note: if you drop these too low, services may fail so you need to know the impact of what you change)
- Stop or remove services you don't need
However if the above are not an option, you're safe ignoring the warning so long as your host does not hit swap (which is also warned about in later versions of CM). As far as I'm aware, the overcommit threshold should not impact your upgrade in any way unless you hit swap while starting the cluster after upgrading. Please also remember that this is a warning about the *maximum* memory usage being more than the server can handle. Many services will use very little memory if they're not actively running things.
Hopefully this clarifies things a little. Should you have problems with a supported upgrade please raise a support ticket and we will be happy to assist.