Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Kudu Size on Disk Compared to Parquet

Kudu Size on Disk Compared to Parquet

Explorer

Hi guys

 

we have done some tests and compared kudu with parquet. In total parquet was about 170GB data. Our issue is that kudu uses about factor 2 more disk space than parquet (without any replication). We have measured the size of the data folder on the disk with "du". The WAL was in a different folder, so it wasn't included.

 

Below is my Schema for our table. column 0-7 are primary keys and we can't change that because of the uniqueness.

 

We are working with Kudu 1.6.0.

 

Any ideas why kudu uses two times more space on disk than parquet? Or is this expected behavior? We created about 2400 tablets distributed over 4 servers.

 

Cheers

 

image001.png

2 REPLIES 2

Re: Kudu Size on Disk Compared to Parquet

Explorer

I've checked some kudu metrics and I found out that at least the metric "kudu_on_disk_data_size" shows more or less the same size as the parquet files. However the "kudu_on_disk_size" metrics correlates with the size on the disk. I've created a new thread to discuss those two Kudu Metrics. I hope somebody can explain the difference.

New thread:
https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Kudu-Metrics-kudu-on-disk-data-size-am...

Highlighted

Re: Kudu Size on Disk Compared to Parquet

Expert Contributor

I think Todd answered your question in the other thread pretty well. Kudu stores additional data structures that Parquet doesn't have to support its online indexed performance, including row indexes and bloom filters, that require additional space on top of what Parquet requires.

 

The kudu_on_disk_size metric also includes the size of the WAL and other metadata files like the tablet superblock and the consensus metadata (although those last two are usually relatively small).