Hi guys
we have done some tests and compared kudu with parquet. In total parquet was about 170GB data. Our issue is that kudu uses about factor 2 more disk space than parquet (without any replication). We have measured the size of the data folder on the disk with "du". The WAL was in a different folder, so it wasn't included.
Below is my Schema for our table. column 0-7 are primary keys and we can't change that because of the uniqueness.
We are working with Kudu 1.6.0.
Any ideas why kudu uses two times more space on disk than parquet? Or is this expected behavior? We created about 2400 tablets distributed over 4 servers.
Cheers
