Support Questions

Find answers, ask questions, and share your expertise

HDP 2.5 Sandbox Keeps running out of space

Explorer

Good day,

I've installed the HDP 2.5 sandbox to try out on my laptop and found that it was continuing to allocate space slowly (without me doing any real activity, such as loading in data) until my hard drive ran out of space. I then found a server that had a 100GB partition which I installed it on and found the same behavior. It took a little over a week, but without any real activity occurring (that I was aware of) on the VM, it eventually just ran out of 100GB of space and crashed the VM... forcing me to do a re-install.

Does anyone have any idea why it would just continually allocate space, as I can't figure out what is growing? I would like to turn it off so that I don't have to keep re-installing the VM.

It's running on VMware if that helps. I believe I installed it on Virtual Box when I set it up on my laptop and experienced the same behavior.

Any assistance is greatly appreciated!

Greg

1 ACCEPTED SOLUTION

Super Guru

I can't say for sure why but I would recommend you suspending your vm when not in use. It is not designed for long running instance. to save your work, don't turn off the box..but instead suspend it. This makes things much easier when you resume activities.

View solution in original post

7 REPLIES 7

Super Guru

I can't say for sure why but I would recommend you suspending your vm when not in use. It is not designed for long running instance. to save your work, don't turn off the box..but instead suspend it. This makes things much easier when you resume activities.

Super Guru

I am not sure about the cloudera vm, as my CDH vm has faulted on me several times so this point I am not sure about. However, what you can do is go into the services you are using and set log4j properties to retain no more then x days

For example..Kafka

  1. log4j.appender.kafkaAppender=org.apache.log4j.RollingFileAppender
  2. log4j.appender.kafkaAppender.MaxFileSize=100MB
  3. log4j.appender.kafkaAppender.MaxBackupIndex=9

This will only allow 9 backup logs

Here is good article on how to control log sizes and retention for each HDP service

https://community.hortonworks.com/content/kbentry/8882/how-to-control-size-of-log-files-for-various-...

Explorer

Also, I can't speak to the overall stability of CDH, as I definitely haven't performed a lot of activity on it... I just know that I didn't experience this particular issue, which was a little unerving for me.

Explorer

I'll give it a try and let you know how it works... I'll accept your answer in the mean time as it looks promising. Thanks!

@Greg Frair

As @Sunile Manjee pointed out, the Sandbox isn't intended for long running usage. Most hadoop components can generate a lot of logging. My guess is that logging is the likely culprit. You don''t have to be heavily using the Sandbox for the components to generate logs. There are regular service checks that occur that will generate logs. When you are not using the Sandbox, you should shut it down or suspend it.

Explorer

Currently, I need it running for my purposes... is there any way that I can turn off logging globally and only turn it on for troubleshooting? I was using the cloudera sandbox and had it running for 3 months straight without this issue, so it's a little concerning to me that this could be present even if/when we productionalize. Also, any ideas how I can confirm whether it's caused by logging or not?

Explorer

Currently, I need it running for my purposes... is there any way that I can turn off logging globally and only turn it on for troubleshooting? I was using the cloudera sandbox and had it running for 3 months straight without this issue, so it's a little concerning to me that this could be present even if/when we productionalize. Also, any ideas how I can confirm whether it's caused by logging or not?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.