Support Questions

Find answers, ask questions, and share your expertise

Changing Zookeeper Data Dir

avatar
Expert Contributor

Hello,

Currently our zookeeper dataDir is at `/dfs/1/hadoop/zookeeper/` but unfortunately, `/dfs/1/` is the HDFS disk mount. Hence in current scenario, for us it is not possible to swap disks for HDFS as zookeeper is also using it.

We wanted to move zookeeper dataDir to some other place like `/usr/lib/zookeeper` but I am not quite sure of what steps needs to be taken. Here's what I think should work.

  • Create new dir.
  • Stop zookeeper and hbase.
  • copy data from old zk datadir to new zk datadir
  • change zk conf to point dataDir to new dir
  • start zk and hbase.

Here what I'm unsure of is if copying the data is the correct way to do this. We do not have staging cluster hence seeking help from community 🙂

Much thanks!

Sanket.

7 REPLIES 7

avatar
Expert Contributor

If you have 3 or more zookeeper servers, then you could carry out these steps on each Zookeeper one by one, in a rolling fashion, thereby keeping the zookeeper quorum intact. Otherwise, the mentioned steps are fine.

Copy all files inside current dataDir (myid , version-2) to the new directory, update 'dataDir' and 'dataLogDir' (if separately configured) properties inside zoo.cfg, set the directory ownership (recursive) to zk service user and restart the zookeeper server. During the startup, zookeeper loads the latest 'snapshot' file and replays the transaction file to load the state. Follower zookeepers also syncs with the leader for the current state.

avatar
Expert Contributor

Thanks @rmaruthiyodan , So let me confirm the steps:

  1. change dataDir conf in ambari . (dataLogDir is not separately configured.)
  2. shutdown zk node.
  3. copy contents to new dir, change permission of folder (myid and version-2/ )
  4. start zk
  5. repeat 2-4 for other two zk.

yes we have 3 zookeeper nodes. I wanted to ask if above steps can be executed while HBase is running. (they should be)

avatar
Expert Contributor

@sanket patel Yes, As above step looks good.

avatar
Expert Contributor

@Karthik Palanisamy I am trying to figure a rolling based approach and in my comment above, I suggest repeating step 2,3,4 for each zookeeper one after other. Is that correct way to go about this?

avatar
Expert Contributor

Yes @sanket patel, You can proceed the same step.

Zookeeper will sync the latest snapshot & logs from remaining quorums so you can skip step3 or copy as your own interest but make sure you provide little time for sync before restart another zk servers.

avatar
Expert Contributor

Thanks all for help! I carried out steps as I mentioned in question. Please include `chown -R` operation too before starting services as mentioned by @rmaruthiyodan

We did it with approx 5min of downtime though, If anyone else carries this operation out without downtime/in rolling fashion, please let community know.

avatar
Contributor

Hi, Please find the below steps for moving zookeeper data directory.

  1. change dataDir conf in ambari ( Go to Ambari -> ZooKeeper -> Configs -> ZooKeeper Server -> ZooKeeper directory /mnt/scratch/zookeeper)
  2. Stop all zookeeper servers ( Zookeeper -> service actions -> stop )
  3. copy contents to new dir, change permission of folder (myid and version-2/ ) . Login to zookeeper1 node $ cp -r /mnt/sda/zookeeper/* /mnt/scratch/zookeeper/ $ chown -R zookeeper:hadoop /mnt/scratch/zookeeper/
  4. start only zookeeper1 node zookeeper server from ambari UI
  5. repeat 2-4 for other two zookeeper servers (zookeeper2 and zookeeper3)
  6. Restart all services if require.