Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to clean up files in Zookeeper Directory

avatar
Expert Contributor

I am unable to restart Zookeeper on one of the nodes in the cluster due to the mount where Zookeeper Directory is pointing to is full. So how can i safely remove/archive some of the files that are under zookeeper directory? and also what kind of data does Zookeeper records? Is it safe to remove them on one particular Zookeeper node while the cluster is up and running? please advise

I see bunch of files in the following format

1. log.170003cfd4

2. snapshot.170003cfd4

1 ACCEPTED SOLUTION

avatar
Master Mentor

The ZooKeeper server continually saves znode snapshot files and, optionally, transactional logs in a Data Directory to enable you to recover data. It's a good idea to back up the ZooKeeper Data Directory periodically. Although ZooKeeper is highly reliable because a persistent copy is replicated on each server, recovering from backups may be necessary if a catastrophic failure or user error occurs.

When you use the default configuration, the ZooKeeper server does not remove the snapshots and log files, so they will accumulate over time. You will need to clean up this directory occasionally, taking into account on your backup schedules and processes. To automate the cleanup, a zkCleanup.sh script is provided in the bin directory of thezookeeper base package. Modify this script as necessary for your situation. In general, you want to run this as a cron task based on your backup schedule.

The data directory is specified by the dataDir parameter in the ZooKeeper configuration file, and the data log directory is specified by the dataLogDir parameter.

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@rbalam for regular maintenance follow this guide this will free up space in your zookeeper dirs. in case you don't care about data, you can login to zookeeper cli and rmr /path. You will need to stop zookeeper cluster for this. Do this as your last option, it's not a safe thing to do.

avatar
Master Mentor

The ZooKeeper server continually saves znode snapshot files and, optionally, transactional logs in a Data Directory to enable you to recover data. It's a good idea to back up the ZooKeeper Data Directory periodically. Although ZooKeeper is highly reliable because a persistent copy is replicated on each server, recovering from backups may be necessary if a catastrophic failure or user error occurs.

When you use the default configuration, the ZooKeeper server does not remove the snapshots and log files, so they will accumulate over time. You will need to clean up this directory occasionally, taking into account on your backup schedules and processes. To automate the cleanup, a zkCleanup.sh script is provided in the bin directory of thezookeeper base package. Modify this script as necessary for your situation. In general, you want to run this as a cron task based on your backup schedule.

The data directory is specified by the dataDir parameter in the ZooKeeper configuration file, and the data log directory is specified by the dataLogDir parameter.