Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Kudu Master Directories

avatar
Explorer

Hello,

 

We have just installed Kudu in our test environment, and are currently running CDH 5.13.1. Due to this being a small POC environment, we only have 2 tablet servers, and a single master, making it only usable for functional testing.

 

There were 4 requried configuration properties upon installation of Kudu with CDH 5.13, for whichthe following were configured:

 

Kudu Master WAL Directory:

/data/kudu/master_wal

 

Kudu Master Data Directories

/data/kudu/master_wal

 

Kudu Tablet Server WAL Directory

/data1/kudu/tablet_wal

 

Kudu Tablet Server Data Directories

/data1/kudu/tablet_data

/data2/kudu/tablet_data

/data3/kudu/tablet_data

 

My question is concerning the Master data directories configuration property:

 

Should multiple directories be used for storing the Kudu master data? It appears this is expected with the configuration property being plural, and it's set up to be configured similar to the tablet server data directories from Cloudera Manager. But if the Kudu Master server resides on one of the master/utility nodes, then there are not multiple JBOD mount points like a worker node.

 

Are there significant benefits of having multiple Kudu master data directories or inherit risks with just a single master data directory? If we configured an additional master data directory on the OS disk (such as under /var or /opt), would this be a concern?

 

I've read that SSDs are recommended for the WAL directories. Is there a major performance impact if the WAL directory is on the same mount point as one of the data directories?

 

Thank you,

Braz

1 ACCEPTED SOLUTION

avatar
Rising Star

> Should multiple directories be used for storing the Kudu master data?
The master nodes generally don't see a huge amount of disk IO, as their role is
primarily focused tablet placement, rather than data storage. The reason
fs_data_dirs is plural for the master is that tablet servers and master nodes
leverage the same FS configuration code. Feel free to use a single directory.

I wouldn't expect it to bottleneck your cluster.

 

> Are there significant benefits of having multiple Kudu master data
> directories or inherit risks with just a single master data directory?
Not really. The master isn't a bottleneck for the most part, and they only
store a few GBs on disk. Also disk failures are not handled for masters as they
are on tablet servers, so the extra disks don't provide any added fault
tolerance either.

 

> I've read that SSDs are recommended for the WAL directories. Is there a major
> performance impact if the WAL directory is on the same mount point as one of
> the data directories?
It's not uncommon to see this, where the fs_wal_dir is the same as the first
entry of fs_data_dirs. There is a caveat to this that in Kudu 1.5 and below,
the first data directory also stored tablet-specific metadata that is used for
the Raft consensus protocol, and we've seen this lead to occasional dips in
performance when tablet server ingest workloads coincide with periods of high
Raft election traffic. This is less relevant for masters, which generally don't
get bottlenecked by disk IO.

View solution in original post

5 REPLIES 5

avatar
Rising Star

> Should multiple directories be used for storing the Kudu master data?
The master nodes generally don't see a huge amount of disk IO, as their role is
primarily focused tablet placement, rather than data storage. The reason
fs_data_dirs is plural for the master is that tablet servers and master nodes
leverage the same FS configuration code. Feel free to use a single directory.

I wouldn't expect it to bottleneck your cluster.

 

> Are there significant benefits of having multiple Kudu master data
> directories or inherit risks with just a single master data directory?
Not really. The master isn't a bottleneck for the most part, and they only
store a few GBs on disk. Also disk failures are not handled for masters as they
are on tablet servers, so the extra disks don't provide any added fault
tolerance either.

 

> I've read that SSDs are recommended for the WAL directories. Is there a major
> performance impact if the WAL directory is on the same mount point as one of
> the data directories?
It's not uncommon to see this, where the fs_wal_dir is the same as the first
entry of fs_data_dirs. There is a caveat to this that in Kudu 1.5 and below,
the first data directory also stored tablet-specific metadata that is used for
the Raft consensus protocol, and we've seen this lead to occasional dips in
performance when tablet server ingest workloads coincide with periods of high
Raft election traffic. This is less relevant for masters, which generally don't
get bottlenecked by disk IO.

avatar
Explorer
Thanks for these answers. Having issues with replying, but for your last answer concerning the WAL directory and the metadata, would you recommend having a separate directory for the Tablet Server WAL?

Thanks

avatar
Rising Star

Yep, that would be ideal in that background flushes/compactions would not affect write performance and Raft elections.

avatar
Explorer
Okay, and I hope I'm not asking too much in this one forum post, but since it's related: what is the recommended number of Tablet Server directories?

Could we number of directories to each JBOD disk used by the DataNode? Of course without using a sub-directory of the DataNode.

avatar
Rising Star

That is up to your workload and how much storage you need per node. It's common to see anywhere from 6 to 12 disks per tablet server. Check out the limitations documentation for some guidance there.