Created on 04-03-2018 02:50 PM - edited 09-16-2022 06:03 AM
Hello,
We have just installed Kudu in our test environment, and are currently running CDH 5.13.1. Due to this being a small POC environment, we only have 2 tablet servers, and a single master, making it only usable for functional testing.
There were 4 requried configuration properties upon installation of Kudu with CDH 5.13, for whichthe following were configured:
Kudu Master WAL Directory:
/data/kudu/master_wal
Kudu Master Data Directories
/data/kudu/master_wal
Kudu Tablet Server WAL Directory
/data1/kudu/tablet_wal
Kudu Tablet Server Data Directories
/data1/kudu/tablet_data
/data2/kudu/tablet_data
/data3/kudu/tablet_data
My question is concerning the Master data directories configuration property:
Should multiple directories be used for storing the Kudu master data? It appears this is expected with the configuration property being plural, and it's set up to be configured similar to the tablet server data directories from Cloudera Manager. But if the Kudu Master server resides on one of the master/utility nodes, then there are not multiple JBOD mount points like a worker node.
Are there significant benefits of having multiple Kudu master data directories or inherit risks with just a single master data directory? If we configured an additional master data directory on the OS disk (such as under /var or /opt), would this be a concern?
I've read that SSDs are recommended for the WAL directories. Is there a major performance impact if the WAL directory is on the same mount point as one of the data directories?
Thank you,
Braz
Created 04-03-2018 03:17 PM
> Should multiple directories be used for storing the Kudu master data?
The master nodes generally don't see a huge amount of disk IO, as their role is
primarily focused tablet placement, rather than data storage. The reason
fs_data_dirs is plural for the master is that tablet servers and master nodes
leverage the same FS configuration code. Feel free to use a single directory.
I wouldn't expect it to bottleneck your cluster.
> Are there significant benefits of having multiple Kudu master data
> directories or inherit risks with just a single master data directory?
Not really. The master isn't a bottleneck for the most part, and they only
store a few GBs on disk. Also disk failures are not handled for masters as they
are on tablet servers, so the extra disks don't provide any added fault
tolerance either.
> I've read that SSDs are recommended for the WAL directories. Is there a major
> performance impact if the WAL directory is on the same mount point as one of
> the data directories?
It's not uncommon to see this, where the fs_wal_dir is the same as the first
entry of fs_data_dirs. There is a caveat to this that in Kudu 1.5 and below,
the first data directory also stored tablet-specific metadata that is used for
the Raft consensus protocol, and we've seen this lead to occasional dips in
performance when tablet server ingest workloads coincide with periods of high
Raft election traffic. This is less relevant for masters, which generally don't
get bottlenecked by disk IO.
Created 04-03-2018 03:17 PM
> Should multiple directories be used for storing the Kudu master data?
The master nodes generally don't see a huge amount of disk IO, as their role is
primarily focused tablet placement, rather than data storage. The reason
fs_data_dirs is plural for the master is that tablet servers and master nodes
leverage the same FS configuration code. Feel free to use a single directory.
I wouldn't expect it to bottleneck your cluster.
> Are there significant benefits of having multiple Kudu master data
> directories or inherit risks with just a single master data directory?
Not really. The master isn't a bottleneck for the most part, and they only
store a few GBs on disk. Also disk failures are not handled for masters as they
are on tablet servers, so the extra disks don't provide any added fault
tolerance either.
> I've read that SSDs are recommended for the WAL directories. Is there a major
> performance impact if the WAL directory is on the same mount point as one of
> the data directories?
It's not uncommon to see this, where the fs_wal_dir is the same as the first
entry of fs_data_dirs. There is a caveat to this that in Kudu 1.5 and below,
the first data directory also stored tablet-specific metadata that is used for
the Raft consensus protocol, and we've seen this lead to occasional dips in
performance when tablet server ingest workloads coincide with periods of high
Raft election traffic. This is less relevant for masters, which generally don't
get bottlenecked by disk IO.
Created 04-04-2018 02:02 PM
Created 04-04-2018 03:12 PM
Yep, that would be ideal in that background flushes/compactions would not affect write performance and Raft elections.
Created 04-05-2018 10:06 AM
Created on 04-05-2018 10:23 AM - edited 04-05-2018 10:24 AM
That is up to your workload and how much storage you need per node. It's common to see anywhere from 6 to 12 disks per tablet server. Check out the limitations documentation for some guidance there.