Zookeeper
transaction logs and snapshot files are created very frequently (multiple files
in every minute) and that fills up the FileSystem in a very short time.
ROOT CAUSE
One or more
application are creating or modifying the znodes too frequently, causing too
many transactions in a short duration. This leads to the creation of too many
transactional log files and snapshot files since they get rolled over after
100,000 entries by default (as defined by zookeeper property 'snapCount')
RESOLUTION
The resolution for
such cases involves reviewing the zookeeper transaction logs to find the znodes
that are updated/created most frequently using the following command on one of
the zookeeper servers:
# cd /usr/hdp/current/zookeeper-server
# java -cp zookeeper.jar:lib/* org.apache.zookeeper.server.LogFormatter /hadoop/zookeeper/version-2/logxxx
(where 'dataDir' is set to '/hadoop/zookeeper' within zookeeper configuration)
Once the frequently
updating znodes are identified using the above command, one should continue
with fixing the related application that is creating such a large number of
updates on zookeeper.
An example of such
an application that can cause this problem is Hbase, when there are very large
number of regions stuck in transition and they repeatedly fail to
become online.