Zookeeper: zookeeper-zookeeper-server-xxxx.out deleted but still open

I am currently having a serious problem with the output file for zookeeper. I recently noticed that /var/log had 15G full, but du -h /var/log only reported less than 1G.

I checked my deleted files (lsof | grep deleted | grep /var/log) I noticed that there are a number of log files that are deleted, but still open. The most concerning of these is /var/log/zookeeper/zookeeper-zookeeper-server-xxxx.out, which is over 13G in size. In spite of being deleted the file is still open.

Our system admin suggested restarting Zookeeper to clear the file lock - unfortunately this is a production cluster and so restarting Zookeeper is an absolute last resort. I have come up with some possible solutions:

  • Since we have a quorum of 3 servers, would it be realistic to restart only the Zookeeper server on the machine that is giving the problem?
  • If that is not realistic, can I truncate the file (as per this article) to clear the space issue without causing problems to Zookeeper?

If neither of these is possible, what are my other options?

