Support Questions

Find answers, ask questions, and share your expertise

zookeeper server + Unable to load database on disk

avatar

We have kafka cluster with 3 nodes , each kafka include zookeeper server and schema registry

 

We get the following error on one of the zookeeper server

 

[2019-11-12 07:44:20,719] ERROR Unable to load database on disk (org.apache.zookeeper.server.quorum.QuorumPeer)
java.io.IOException: Unreasonable length = 198238896
at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:629)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:166)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:601)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:591)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:164)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)

 

 

seems that some snapshot files under folder /opt/confluent/zookeeper/data/version-2 are corrupted

 

 

under folder version-2 , we have the following example files

 

 

many files as log.3000667b5
many files as snapshot.200014247
one file - acceptedEpoch
one file – currentEpoch


so the question is – how to start the zookeeper server

 

from my understanding we have two options , but not sure about them

 

one option is to move version-2 folder to other place as version-2_backup 
and create new folder - version-2 under /opt/confluent/zookeeper/data
then start the zookeeper server and hope that snapshot will copied from other good active zookeeper server ?

 

 

second option is maybe to move version-2 folder to other place as version-2_backup , create new folder as - version-2
and copy all content from version-2 from good machine to the bad zookeeper server to version-2 , but I not sure if this is right option?

Michael-Bronson
2 ACCEPTED SOLUTIONS

avatar
Master Mentor

@mike_bronson7 

Yes "Unable to load database on disk" is due to corruption  also as a backup r 

# mv /opt/confluent/zookeeper/data/version-2   /tmp

Then restart the zookeeper it should copy  the snapshot from one of the healthy nodes in the quorum

 

HTH

View solution in original post

avatar
Master Mentor

@mike_bronson7 

Here is a good compromise hoping you have enough  disk space

change directory

# cd /opt/confluent/zookeeper/data

Move the directory

# mv version-2  version-2_bck

Recreate with same permissions 

# mkdir version-2

# chown user:group  version-2

Compare the permissions

# ls -al 

version-2

version-2_bck

Now you can restart zookeeper 

 

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@mike_bronson7 

Yes "Unable to load database on disk" is due to corruption  also as a backup r 

# mv /opt/confluent/zookeeper/data/version-2   /tmp

Then restart the zookeeper it should copy  the snapshot from one of the healthy nodes in the quorum

 

HTH

avatar

thank you , any risks with that option?

 

do we need also to create empty folder - version-2 ? under  /opt/confluent/zookeeper/data/

Michael-Bronson

avatar

Dear Shelton

 

do we need also to create empty folder - version-2 ? under /opt/confluent/zookeeper/data

after we moved the original folder - version-2

Michael-Bronson

avatar
Master Mentor

@mike_bronson7 

 

Yes, in fact, a better solution is mv all the contents of  /opt/confluent/zookeeper/data/version-2   usually ie [log.1,log.18263] there could be many that's why its easier to move  than delete but remember to recreate the version-2 directory with the same user: group and permissions  take note of those details 🙂

HTH 

avatar

I prefer to move the folder - version-2 

and create it again with all permissions  user: group

Michael-Bronson

avatar
Master Mentor

@mike_bronson7 

Here is a good compromise hoping you have enough  disk space

change directory

# cd /opt/confluent/zookeeper/data

Move the directory

# mv version-2  version-2_bck

Recreate with same permissions 

# mkdir version-2

# chown user:group  version-2

Compare the permissions

# ls -al 

version-2

version-2_bck

Now you can restart zookeeper 

 

avatar

thank you so much

 

btw - can I get your advice about other thread - https://community.cloudera.com/t5/Support-Questions/schema-registry-service-failed-to-start-due-sche...

Michael-Bronson