- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Move zookeeper datadir and datalogdir to new disk
- Labels:
-
Apache Zookeeper
Created on ‎08-25-2016 06:29 AM - edited ‎09-16-2022 03:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am looking for guidance on moving the zookeeper (datadir and datalogdir) to a dedicated disk as per the best practice. I am unable to find a documentation which can help me. Currently the datadir and datalogdir disk is shared with other processes.
Any assistance is appreciated.
Created on ‎10-07-2016 01:44 AM - edited ‎10-07-2016 01:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's say your dataDir and old dataLogDir is /var/lib/zookeeper and now you're moving dataLogDir to /var/lib/zookeeper-log. First you change this in the service-wide configuration, which will make the stale configuration icon appear. Then you stop zk1, ssh into zk1 and run the following commands:
$ mkdir -p /var/lib/zookeeper-log/version-2
$ cp /var/lib/zookeeper/version-2/log.* /var/lib/zookeeper-log/version-2/
$ chown -R zookeeper:zookeeper /var/lib/zookeeper-log
Then you can start zk1 and wait until it's running and shows as either leader or follower in the Cloudera Manager service page too. After that's done, you can do the same with zk2 and finally with zk3 too. By this point the stale configuration alert should disappear and everything should be fine cluster-wide.
As you said, the log.* files need to be copied only.
Created ‎08-29-2016 06:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In terms of using dedicated disk, the more important requirement is that of the dataLogDir (than the dataDir). ZK calls fsync on the logs written into the dataLogDir which can end up blocking for a long time when there are other processes sharing the disk.
You can and should keep the dataDir (where snapshots get stored) separate from the dataLogDir. This way large snapshot writes don't affect the transaction logging performance either. The dataDir location can be on a shared disk as its write is not synchronous.
Does this help?
Created ‎08-29-2016 07:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply. In the current situation, the datadir and datalogdir are in the same location. We have the below files in the dir.
acceptedEpoch
currentEpoch
log.xxxxxx
snapshot.xxxx
If I have to move the datalogdir only, I just need to copy the log.xxx files to the new location in all Zookeeper servers, update the directory in the configuration and restart the zookeeper instances only. Could you please confirm if it is correct?
The Epoch files and snapshot.xxx belongs in the datadir correct?
Created on ‎10-07-2016 01:44 AM - edited ‎10-07-2016 01:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's say your dataDir and old dataLogDir is /var/lib/zookeeper and now you're moving dataLogDir to /var/lib/zookeeper-log. First you change this in the service-wide configuration, which will make the stale configuration icon appear. Then you stop zk1, ssh into zk1 and run the following commands:
$ mkdir -p /var/lib/zookeeper-log/version-2
$ cp /var/lib/zookeeper/version-2/log.* /var/lib/zookeeper-log/version-2/
$ chown -R zookeeper:zookeeper /var/lib/zookeeper-log
Then you can start zk1 and wait until it's running and shows as either leader or follower in the Cloudera Manager service page too. After that's done, you can do the same with zk2 and finally with zk3 too. By this point the stale configuration alert should disappear and everything should be fine cluster-wide.
As you said, the log.* files need to be copied only.
