Created 02-11-2017 06:35 PM
Good day,
I've installed the HDP 2.5 sandbox to try out on my laptop and found that it was continuing to allocate space slowly (without me doing any real activity, such as loading in data) until my hard drive ran out of space. I then found a server that had a 100GB partition which I installed it on and found the same behavior. It took a little over a week, but without any real activity occurring (that I was aware of) on the VM, it eventually just ran out of 100GB of space and crashed the VM... forcing me to do a re-install.
Does anyone have any idea why it would just continually allocate space, as I can't figure out what is growing? I would like to turn it off so that I don't have to keep re-installing the VM.
It's running on VMware if that helps. I believe I installed it on Virtual Box when I set it up on my laptop and experienced the same behavior.
Any assistance is greatly appreciated!
Greg
Created 02-11-2017 06:38 PM
I can't say for sure why but I would recommend you suspending your vm when not in use. It is not designed for long running instance. to save your work, don't turn off the box..but instead suspend it. This makes things much easier when you resume activities.
Created 02-11-2017 06:38 PM
I can't say for sure why but I would recommend you suspending your vm when not in use. It is not designed for long running instance. to save your work, don't turn off the box..but instead suspend it. This makes things much easier when you resume activities.
Created 02-13-2017 02:33 AM
I am not sure about the cloudera vm, as my CDH vm has faulted on me several times so this point I am not sure about. However, what you can do is go into the services you are using and set log4j properties to retain no more then x days
For example..Kafka
This will only allow 9 backup logs
Here is good article on how to control log sizes and retention for each HDP service
Created 02-15-2017 07:43 PM
Also, I can't speak to the overall stability of CDH, as I definitely haven't performed a lot of activity on it... I just know that I didn't experience this particular issue, which was a little unerving for me.
Created 02-15-2017 07:40 PM
I'll give it a try and let you know how it works... I'll accept your answer in the mean time as it looks promising. Thanks!
Created 02-11-2017 07:00 PM
As @Sunile Manjee pointed out, the Sandbox isn't intended for long running usage. Most hadoop components can generate a lot of logging. My guess is that logging is the likely culprit. You don''t have to be heavily using the Sandbox for the components to generate logs. There are regular service checks that occur that will generate logs. When you are not using the Sandbox, you should shut it down or suspend it.
Created 02-11-2017 11:45 PM
Currently, I need it running for my purposes... is there any way that I can turn off logging globally and only turn it on for troubleshooting? I was using the cloudera sandbox and had it running for 3 months straight without this issue, so it's a little concerning to me that this could be present even if/when we productionalize. Also, any ideas how I can confirm whether it's caused by logging or not?
Created 02-11-2017 11:45 PM
Currently, I need it running for my purposes... is there any way that I can turn off logging globally and only turn it on for troubleshooting? I was using the cloudera sandbox and had it running for 3 months straight without this issue, so it's a little concerning to me that this could be present even if/when we productionalize. Also, any ideas how I can confirm whether it's caused by logging or not?