Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NIFI : zookeeper cleanup directory version-2

avatar
Rising Star

Hi all

do you know how is it possible to cleanup some file in this directory version-2 now it is up to 1.2Go?

It contains log.* and snapshot.*

/var/opt/hosting/nifi-1.1.0/conf/state/zookeeper
-bash-4.1# du -ks version-2/
1126428 version-2/

thanks

1 ACCEPTED SOLUTION

avatar
Super Mentor

@mayki wogno

Is the same directory the same size of everyone of your zookeeper nodes? If not you may be having an issue on only one of your znodes. You should be able to shutdown the zookeeper node and purge all those files. The pertain files will be re-written from the other znodes in the zookeeper cluster when it rejoins the zookeeper cluster.

Zookeeper is storing information about who is your current cluster coordinator, primary node, and any cluster wide state various from various processor in your dataflows.

I am assuming you are running the embedded zookeeper here. In that case the zookeeper.properties file should control the auto purge of the snapshots through the following properties:

autopurge.purgeInterval=24

autopurge.snapRetainCount=30

The transaction logs should be handle via routine maintenance which you can find here:

http://archive.cloudera.com/cdh4/cdh/4/zookeeper/zookeeperAdmin.html#sc_maintenance

Thanks,

Matt

View solution in original post

4 REPLIES 4

avatar
Super Mentor

@mayki wogno

Is the same directory the same size of everyone of your zookeeper nodes? If not you may be having an issue on only one of your znodes. You should be able to shutdown the zookeeper node and purge all those files. The pertain files will be re-written from the other znodes in the zookeeper cluster when it rejoins the zookeeper cluster.

Zookeeper is storing information about who is your current cluster coordinator, primary node, and any cluster wide state various from various processor in your dataflows.

I am assuming you are running the embedded zookeeper here. In that case the zookeeper.properties file should control the auto purge of the snapshots through the following properties:

autopurge.purgeInterval=24

autopurge.snapRetainCount=30

The transaction logs should be handle via routine maintenance which you can find here:

http://archive.cloudera.com/cdh4/cdh/4/zookeeper/zookeeperAdmin.html#sc_maintenance

Thanks,

Matt

avatar
Rising Star

@Matt : it seems that purge not correctly running. Yes i'm running embedded zookeeper with default properties

-bash-4.1# ls -rtl | grep snapshot |wc -l
93

-bash-4.1# ls -rtl | more
total 1126420
-rw-r--r-- 1 root root 67108880 Dec 12 10:28 log.100000001
-rw-r--r-- 1 root root      979 Dec 12 14:33 snapshot.200000006
-rw-r--r-- 1 root root 67108880 Dec 12 14:34 log.200000007
-rw-r--r-- 1 root root 67108880 Dec 12 15:00 log.300000001
-rw-r--r-- 1 root root     1167 Dec 12 15:01 snapshot.400000006

avatar
Super Mentor

@mayki wogno

Both zookeeper and NiFi can be very resource intensive applications. Fine for development, but recommend setting up your own external zookeeper cluster for using in production environments. It is possible load is affecting the zookeeper cleanup. You can use the linked Zookeeper maintenance guide to clean-up your zk version-2 directory.

Snapshots are nothing more then backups in time. Considering that the information that NiFi stores in ZK is ever changing, I personally don't see much value in being able to restore from backup. (Going back to different retained state).

Thanks,

Matt

avatar
Super Mentor

Try changing the values to a very small number from their defaults:

autopurge.purgeInterval=1
autopurge.snapRetainCount=3

A restart of zookeeper (In your case Nifi) will be needed for changes to take affect.