Created on 11-19-2019 10:39 AM - last edited on 11-19-2019 09:38 PM by VidyaSargur
We have kafka cluster with 3 nodes , each kafka include zookeeper server and schema registry
We get the following error on one of the zookeeper server
[2019-11-12 07:44:20,719] ERROR Unable to load database on disk (org.apache.zookeeper.server.quorum.QuorumPeer)
java.io.IOException: Unreasonable length = 198238896
at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:629)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:601)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:591)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:164)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
seems that some snapshot files under folder /opt/confluent/zookeeper/data/version-2 are corrupted
under folder version-2 , we have the following example files
many files as log.3000667b5
many files as snapshot.200014247
one file - acceptedEpoch
one file – currentEpoch
so the question is – how to start the zookeeper server
from my understanding we have two options , but not sure about them
one option is to move version-2 folder to other place as version-2_backup
and create new folder - version-2 under /opt/confluent/zookeeper/data
then start the zookeeper server and hope that snapshot will copied from other good active zookeeper server ?
second option is maybe to move version-2 folder to other place as version-2_backup , create new folder as - version-2
and copy all content from version-2 from good machine to the bad zookeeper server to version-2 , but I not sure if this is right option?
Created 11-19-2019 11:21 AM
Yes "Unable to load database on disk" is due to corruption also as a backup r
# mv /opt/confluent/zookeeper/data/version-2 /tmp
Then restart the zookeeper it should copy the snapshot from one of the healthy nodes in the quorum
HTH
Created 11-19-2019 12:57 PM
Here is a good compromise hoping you have enough disk space
change directory
# cd /opt/confluent/zookeeper/data
Move the directory
# mv version-2 version-2_bck
Recreate with same permissions
# mkdir version-2
# chown user:group version-2
Compare the permissions
# ls -al
version-2
version-2_bck
Now you can restart zookeeper
Created 11-19-2019 11:21 AM
Yes "Unable to load database on disk" is due to corruption also as a backup r
# mv /opt/confluent/zookeeper/data/version-2 /tmp
Then restart the zookeeper it should copy the snapshot from one of the healthy nodes in the quorum
HTH
Created on 11-19-2019 11:22 AM - edited 11-19-2019 11:24 AM
thank you , any risks with that option?
do we need also to create empty folder - version-2 ? under /opt/confluent/zookeeper/data/
Created on 11-19-2019 11:52 AM - edited 11-19-2019 11:53 AM
Dear Shelton
do we need also to create empty folder - version-2 ? under /opt/confluent/zookeeper/data
after we moved the original folder - version-2
Created 11-19-2019 12:22 PM
Yes, in fact, a better solution is mv all the contents of /opt/confluent/zookeeper/data/version-2 usually ie [log.1,log.18263] there could be many that's why its easier to move than delete but remember to recreate the version-2 directory with the same user: group and permissions take note of those details 🙂
HTH
Created 11-19-2019 12:30 PM
I prefer to move the folder - version-2
and create it again with all permissions user: group
Created 11-19-2019 12:57 PM
Here is a good compromise hoping you have enough disk space
change directory
# cd /opt/confluent/zookeeper/data
Move the directory
# mv version-2 version-2_bck
Recreate with same permissions
# mkdir version-2
# chown user:group version-2
Compare the permissions
# ls -al
version-2
version-2_bck
Now you can restart zookeeper
Created 11-19-2019 01:09 PM
thank you so much
btw - can I get your advice about other thread - https://community.cloudera.com/t5/Support-Questions/schema-registry-service-failed-to-start-due-sche...