I am currently having a serious problem with the output file for zookeeper. I recently noticed that /var/log had 15G full, but du -h /var/log only reported less than 1G.
I checked my deleted files (lsof | grep deleted | grep
/var/log) I noticed that there are a number of log files that are deleted, but still open. The most concerning of these is /var/log/zookeeper/zookeeper-zookeeper-server-xxxx.out, which is over 13G in size. In spite of being deleted the file is still open.
Our system admin suggested restarting Zookeeper to clear the file lock - unfortunately this is a production cluster and so restarting Zookeeper is an absolute last resort. I have come up with some possible solutions:
Since we have a quorum of 3 servers, would it be realistic to restart only the Zookeeper server on the machine that is giving the problem?
If that is not realistic, can I truncate the file (as per this article) to clear the space issue without causing problems to Zookeeper?
If neither of these is possible, what are my other options?