Is it a good practice (or necessary) to do scheduled restarts of all hosts in the cluster?
We have noticed that the heap usage of the NodeManager process gradually increases over a period of time (around 2 months) and tops out. This results in GC delays. Once we restart the NodeManager role in the host, everything goes back to normal till the same thing happens again after some time.
We have observed this in the the Nodemanager process only. The datanode process in the same host does not seem to be having this issue. Atleast till now.
Is a scheduled restart of all hosts periodically something that is expected in a Hadoop cluster? If yes, what is the best way to do it (In the sense, how can we automate the stopping of services in the node before the scheduled restart happens?). Or is it ok for the hosts to be restarted without stopping CDH services (I believe this might cause some kind of corruption).
I don't think that's necessary. How much Java Heap you have configured for the NodeManagers?
The symptoms you wrote down is a bit like that your NM runs out of free heap, and that's why it goes GC-ing.
Maybe simply increasing the Java Heap size for the Nodemanager could solve the issue.
For the host restarts, generally you should do a clean reboot the host/OS - so the hadoop services should be stopped on that host. If you use CM, the generic init scripts won't stop CDH services on the host, so you should use CM to do that first.
@zegab Thank you for your response.
You are correct. The NodeManager runs out of heap. The JAVA heap for NodeManager is 1 GB. But the concern is that the Heap usage increases gradually and tops out in a period of 2 months or so. Hence I am sure if I increase the heap to 2 GB, it will only give me may be 4 months. Then it will again top out. What I am trying to figure out is if it is normal for the heap usage to increase over a period of time. It comes back down after restart of the NodeManager instance
Yes, it could top at 2 gigs too, but there can be a difference:
The Java VM manages memory quite automatically. It will reserve memory from the heap when it's needed, and will free up unused memory periodically (called GC - garbage collection). When there is a request for memory allocation, but the heap is quite used, it will invoke the GC process to find and free up unused memory blocks. If the memory is quite full actually, it could take a long time until the right amount (for the new allocation) can be freed up - this is called a GC pause, and if it happens frequently, it can decrease the performance of the process.
This can be a foresign of potential OOM crashes, when there is no memory can be freed up, but there is a need - the process will be killed.
I don't expect that an NM would consume all heap indefinitely, unless you are using some old version of CDH that has a bug in the NM. Even very large hosts (with 256GB of physical memory) can go with 2-4GB heap for a NodeManager. The exact value is dependent on how many containers you can run in paralell - if the NM launches more containers, itt will need more memory to track them.
The concerning thing is for you are the GC pauses. If you increase the heap to 2GB, it could be used up all too, but if you get rid of the GC pauses by this, it would show a much better health for your NM.