Created on 06-14-2017 09:35 AM - edited 09-16-2022 04:45 AM
Hi,
I've seen that when I create any empty partition in kudu, it occupies around 65MiB in disk. I have some cases with a huge number of partitions, and this space is eatting up the disk, for partitons that are empty!! Is there a way to change this 'default' space occupied by partition? How?
Thank you very much!
Created 06-16-2017 11:19 AM
Yes, what you're observing is Kudu preallocating one 64 MB write-ahead log segment for each partition. The space will be filled once you start writing to the partition.
In Kudu 1.4 we dropped the segment size from 64 MB to 8 MB. If you'd like to make that change now, you can do so via the --log_segment_size_mb command line option. An alternative would be to disable preallocation via --log_async_preallocate_segments=false and/or --log_preallocate_segments=false, but that's not something we generally test so I would advise against it.
Created 06-14-2017 11:35 AM
What version of Kudu are you using? How exactly did you measure the new partition's space consumption?
Created 06-15-2017 12:46 AM
First of all, thank you for your reply.
Kudu version:
$ kudu -version kudu 1.3.0-cdh5.11.0 revision 4dcf4a9d516865d249f4cb9b07f93c67e84614ae build type RELEASE built by jenkins at 12 Apr 2017 14:02:23 PST on impala-ec2-pkg-centos-7-0cb2.vpc.cloudera.com build id 2017-04-12_13-25-54
To see the space consumption, I first see how much used space I have in the FS:
/dev/mapper/vg_hadoop-lv_hadoop 99G 2.3G 92G 3% /opt/hadoop
Then I create a table using Impala with many partitions by range (50 for this example):
[bigdata04dev.cpd:21000] > CREATE TABLE vndr1.TEST_API_KUDU ( > starttime BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > rnc STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > nodeb STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > cell STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > vendor STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > ne_version STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > oss_avail_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > download_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > load_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > gp INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > call_attempts INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > data_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > drop_calls INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > error_bits INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > rab_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > rrc_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > voice_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > used_power INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > packet_duration INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, > PRIMARY KEY (starttime, rnc, nodeb, cell) > ) > PARTITION BY RANGE (starttime) > ( > PARTITION VALUES < unix_timestamp('2016-05-22'), > PARTITION unix_timestamp('2016-05-22') <= VALUES < unix_timestamp('2016-05-23'), > PARTITION unix_timestamp('2016-05-23') <= VALUES < unix_timestamp('2016-05-24'), > PARTITION unix_timestamp('2016-05-24') <= VALUES < unix_timestamp('2016-05-25'), > PARTITION unix_timestamp('2016-05-25') <= VALUES < unix_timestamp('2016-05-26'), > PARTITION unix_timestamp('2016-05-26') <= VALUES < unix_timestamp('2016-05-27'), > PARTITION unix_timestamp('2016-05-27') <= VALUES < unix_timestamp('2016-05-28'), > PARTITION unix_timestamp('2016-05-28') <= VALUES < unix_timestamp('2016-05-29'), > PARTITION unix_timestamp('2016-05-29') <= VALUES < unix_timestamp('2016-05-30'), > PARTITION unix_timestamp('2016-05-30') <= VALUES < unix_timestamp('2016-05-31'), > PARTITION unix_timestamp('2016-05-31') <= VALUES < unix_timestamp('2016-06-01'), > PARTITION unix_timestamp('2016-06-01') <= VALUES < unix_timestamp('2016-06-02'), > PARTITION unix_timestamp('2016-06-02') <= VALUES < unix_timestamp('2016-06-03'), > PARTITION unix_timestamp('2016-06-03') <= VALUES < unix_timestamp('2016-06-04'), > PARTITION unix_timestamp('2016-06-04') <= VALUES < unix_timestamp('2016-06-05'), > PARTITION unix_timestamp('2016-06-05') <= VALUES < unix_timestamp('2016-06-06'), > PARTITION unix_timestamp('2016-06-06') <= VALUES < unix_timestamp('2016-06-07'), > PARTITION unix_timestamp('2016-06-07') <= VALUES < unix_timestamp('2016-06-08'), > PARTITION unix_timestamp('2016-06-08') <= VALUES < unix_timestamp('2016-06-09'), > PARTITION unix_timestamp('2016-06-09') <= VALUES < unix_timestamp('2016-06-10'), > PARTITION unix_timestamp('2016-06-10') <= VALUES < unix_timestamp('2016-06-11'), > PARTITION unix_timestamp('2016-06-11') <= VALUES < unix_timestamp('2016-06-12'), > PARTITION unix_timestamp('2016-06-12') <= VALUES < unix_timestamp('2016-06-13'), > PARTITION unix_timestamp('2016-06-13') <= VALUES < unix_timestamp('2016-06-14'), > PARTITION unix_timestamp('2016-06-14') <= VALUES < unix_timestamp('2016-06-15'), > PARTITION unix_timestamp('2016-06-15') <= VALUES < unix_timestamp('2016-06-16'), > PARTITION unix_timestamp('2016-06-16') <= VALUES < unix_timestamp('2016-06-17'), > PARTITION unix_timestamp('2016-06-17') <= VALUES < unix_timestamp('2016-06-18'), > PARTITION unix_timestamp('2016-06-18') <= VALUES < unix_timestamp('2016-06-19'), > PARTITION unix_timestamp('2016-06-19') <= VALUES < unix_timestamp('2016-06-20'), > PARTITION unix_timestamp('2016-06-20') <= VALUES < unix_timestamp('2016-06-21'), > PARTITION unix_timestamp('2016-06-21') <= VALUES < unix_timestamp('2016-06-22'), > PARTITION unix_timestamp('2016-06-22') <= VALUES < unix_timestamp('2016-06-23'), > PARTITION unix_timestamp('2016-06-23') <= VALUES < unix_timestamp('2016-06-24'), > PARTITION unix_timestamp('2016-06-24') <= VALUES < unix_timestamp('2016-06-25'), > PARTITION unix_timestamp('2016-06-25') <= VALUES < unix_timestamp('2016-06-26'), > PARTITION unix_timestamp('2016-06-26') <= VALUES < unix_timestamp('2016-06-27'), > PARTITION unix_timestamp('2016-06-27') <= VALUES < unix_timestamp('2016-06-28'), > PARTITION unix_timestamp('2016-06-28') <= VALUES < unix_timestamp('2016-06-29'), > PARTITION unix_timestamp('2016-06-29') <= VALUES < unix_timestamp('2016-06-30'), > PARTITION unix_timestamp('2016-06-30') <= VALUES < unix_timestamp('2016-07-01'), > PARTITION unix_timestamp('2016-07-01') <= VALUES < unix_timestamp('2016-07-02'), > PARTITION unix_timestamp('2016-07-02') <= VALUES < unix_timestamp('2016-07-03'), > PARTITION unix_timestamp('2016-07-03') <= VALUES < unix_timestamp('2016-07-04'), > PARTITION unix_timestamp('2016-07-04') <= VALUES < unix_timestamp('2016-07-05'), > PARTITION unix_timestamp('2016-07-05') <= VALUES < unix_timestamp('2016-07-06'), > PARTITION unix_timestamp('2016-07-06') <= VALUES < unix_timestamp('2016-07-07'), > PARTITION unix_timestamp('2016-07-07') <= VALUES < unix_timestamp('2016-07-08'), > PARTITION unix_timestamp('2016-07-08') <= VALUES < unix_timestamp('2016-07-09'), > PARTITION unix_timestamp('2016-07-09') <= VALUES < unix_timestamp('2016-07-10') > ) > STORED AS KUDU > TBLPROPERTIES ('kudu.master_addresses'='192.168.10.35', 'kudu.table_name'='vndr1.TEST_API_KUDU'); Query: create TABLE vndr1.TEST_API_KUDU ( starttime BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, rnc STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, nodeb STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, cell STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, vendor STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, ne_version STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, oss_avail_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, download_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, load_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, gp INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, call_attempts INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, data_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, drop_calls INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, error_bits INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, rab_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, rrc_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, voice_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, used_power INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, packet_duration INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, PRIMARY KEY (starttime, rnc, nodeb, cell) ) PARTITION BY RANGE (starttime) ( PARTITION VALUES < unix_timestamp('2016-05-22'), PARTITION unix_timestamp('2016-05-22') <= VALUES < unix_timestamp('2016-05-23'), PARTITION unix_timestamp('2016-05-23') <= VALUES < unix_timestamp('2016-05-24'), PARTITION unix_timestamp('2016-05-24') <= VALUES < unix_timestamp('2016-05-25'), PARTITION unix_timestamp('2016-05-25') <= VALUES < unix_timestamp('2016-05-26'), PARTITION unix_timestamp('2016-05-26') <= VALUES < unix_timestamp('2016-05-27'), PARTITION unix_timestamp('2016-05-27') <= VALUES < unix_timestamp('2016-05-28'), PARTITION unix_timestamp('2016-05-28') <= VALUES < unix_timestamp('2016-05-29'), PARTITION unix_timestamp('2016-05-29') <= VALUES < unix_timestamp('2016-05-30'), PARTITION unix_timestamp('2016-05-30') <= VALUES < unix_timestamp('2016-05-31'), PARTITION unix_timestamp('2016-05-31') <= VALUES < unix_timestamp('2016-06-01'), PARTITION unix_timestamp('2016-06-01') <= VALUES < unix_timestamp('2016-06-02'), PARTITION unix_timestamp('2016-06-02') <= VALUES < unix_timestamp('2016-06-03'), PARTITION unix_timestamp('2016-06-03') <= VALUES < unix_timestamp('2016-06-04'), PARTITION unix_timestamp('2016-06-04') <= VALUES < unix_timestamp('2016-06-05'), PARTITION unix_timestamp('2016-06-05') <= VALUES < unix_timestamp('2016-06-06'), PARTITION unix_timestamp('2016-06-06') <= VALUES < unix_timestamp('2016-06-07'), PARTITION unix_timestamp('2016-06-07') <= VALUES < unix_timestamp('2016-06-08'), PARTITION unix_timestamp('2016-06-08') <= VALUES < unix_timestamp('2016-06-09'), PARTITION unix_timestamp('2016-06-09') <= VALUES < unix_timestamp('2016-06-10'), PARTITION unix_timestamp('2016-06-10') <= VALUES < unix_timestamp('2016-06-11'), PARTITION unix_timestamp('2016-06-11') <= VALUES < unix_timestamp('2016-06-12'), PARTITION unix_timestamp('2016-06-12') <= VALUES < unix_timestamp('2016-06-13'), PARTITION unix_timestamp('2016-06-13') <= VALUES < unix_timestamp('2016-06-14'), PARTITION unix_timestamp('2016-06-14') <= VALUES < unix_timestamp('2016-06-15'), PARTITION unix_timestamp('2016-06-15') <= VALUES < unix_timestamp('2016-06-16'), PARTITION unix_timestamp('2016-06-16') <= VALUES < unix_timestamp('2016-06-17'), PARTITION unix_timestamp('2016-06-17') <= VALUES < unix_timestamp('2016-06-18'), PARTITION unix_timestamp('2016-06-18') <= VALUES < unix_timestamp('2016-06-19'), PARTITION unix_timestamp('2016-06-19') <= VALUES < unix_timestamp('2016-06-20'), PARTITION unix_timestamp('2016-06-20') <= VALUES < unix_timestamp('2016-06-21'), PARTITION unix_timestamp('2016-06-21') <= VALUES < unix_timestamp('2016-06-22'), PARTITION unix_timestamp('2016-06-22') <= VALUES < unix_timestamp('2016-06-23'), PARTITION unix_timestamp('2016-06-23') <= VALUES < unix_timestamp('2016-06-24'), PARTITION unix_timestamp('2016-06-24') <= VALUES < unix_timestamp('2016-06-25'), PARTITION unix_timestamp('2016-06-25') <= VALUES < unix_timestamp('2016-06-26'), PARTITION unix_timestamp('2016-06-26') <= VALUES < unix_timestamp('2016-06-27'), PARTITION unix_timestamp('2016-06-27') <= VALUES < unix_timestamp('2016-06-28'), PARTITION unix_timestamp('2016-06-28') <= VALUES < unix_timestamp('2016-06-29'), PARTITION unix_timestamp('2016-06-29') <= VALUES < unix_timestamp('2016-06-30'), PARTITION unix_timestamp('2016-06-30') <= VALUES < unix_timestamp('2016-07-01'), PARTITION unix_timestamp('2016-07-01') <= VALUES < unix_timestamp('2016-07-02'), PARTITION unix_timestamp('2016-07-02') <= VALUES < unix_timestamp('2016-07-03'), PARTITION unix_timestamp('2016-07-03') <= VALUES < unix_timestamp('2016-07-04'), PARTITION unix_timestamp('2016-07-04') <= VALUES < unix_timestamp('2016-07-05'), PARTITION unix_timestamp('2016-07-05') <= VALUES < unix_timestamp('2016-07-06'), PARTITION unix_timestamp('2016-07-06') <= VALUES < unix_timestamp('2016-07-07'), PARTITION unix_timestamp('2016-07-07') <= VALUES < unix_timestamp('2016-07-08'), PARTITION unix_timestamp('2016-07-08') <= VALUES < unix_timestamp('2016-07-09'), PARTITION unix_timestamp('2016-07-09') <= VALUES < unix_timestamp('2016-07-10') ) STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='192.168.10.35', 'kudu.table_name'='vndr1.TEST_API_KUDU') Fetched 0 row(s) in 0.80s [bigdata04dev.cpd:21000] >
And then I see the used space again for this FS:
/dev/mapper/vg_hadoop-lv_hadoop 99G 5.4G 89G 6% /opt/hadoop
As you can see, it went from 2.3G to 5.4G with just 50 range partitions. That is around 63.5M each partition, and the table is empty.
Thank you very much
Created 06-15-2017 09:43 AM
It would be interesting (and useful) to see what new files were created as part of the CREATE TABLE. I suspect the increase in consumption was due to new WAL segments. We create one per partition and in Kudu 1.3 I believe we preallocate it to 32M.
Can you confirm this? Find Kudu's configured WAL directory and look at its contents before and after the CREATE TABLE.
Created 06-16-2017 01:54 AM
Hi,
The WAL directory is empty (there are no tables in Kudu right now):
bash-4.2$ ls -last total 148 144 drwx------ 2 kudu kudu 143360 Jun 16 10:45 . 4 drwx------ 6 kudu kudu 4096 May 16 11:05 .. bash-4.2$ pwd /opt/hadoop/kudu/tserver/wals
And when I create the table (using the same script as before) the content of the folder is this:
bash-4.2$ pwd /opt/hadoop/kudu/tserver/wals bash-4.2$ ls -last total 348 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 44738a508974486089d1af0b6fa07caa 144 drwx------ 52 kudu kudu 143360 Jun 16 10:48 . 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 5fb3fe9397e94ec0a042a152de68c49f 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 cd102f09427e4b61bd5f799a409371a0 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 5a68aac0f8e1452aad66bc23f1632997 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 3ff60b2b885b4db89adf8acba0c0f9bb 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 87f8a8fe81c9436d86439a827ed2abd9 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 531d52acc8fb467b9deb2608b60eb028 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 ade15170415f442dba2ac697eb882fbe 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 6a68fab92bb64f57a5dd9d521995cee9 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 5894dbdb7a7748de92e0d65102948f58 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 1cfa68a85b964d2ba41b64ac8b9523d3 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 2cef1717b4c646dfa93c7962438bd2e5 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 067a8f2b467b45af94225a099e18b4c8 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 c3f18cd8a1564e56a3cab5e7502ca5bb 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 03a8eb79664841509380078ddee7b335 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 dc57712b1d324567bcd9a7fd22e850b7 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 ec4fde3b8e2247c6ab2e0f54b0ec85fb 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 91d019cf106544a597e5f4e59e07a3db 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 352b0f986534489b9e6aab9e3bd9cbbd 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 ca21b5be58f8440c9be81da103befe2a 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 1bb6c183bdeb4acf849f772996c4e8bb 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 ec577d25518745dfa83c81301f0835f3 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 7108c9e5a6fa4a9abe2d1cff3488b34f 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 4fe6fddfe52842fe9dd73929924f48ef 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 e7c3f2130dfd49d3abc61949faaa22d1 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 60509a5c83ef4aeca27acb9803efad34 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 6b6d0c99a47847ee85ec47a25adbd562 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 8e16c9d9dc0d4b0ba7c766d0cd52bc0d 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 5bb9d62f9d6e41f3bb8561699d7e2ec7 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 ae880022adbf4e15b33d040f697dd41b 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 1505fda0c54b4d6985f972adee2653e5 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 731ab85af7d147d2bb073a303afbcd3a 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 5e0ed0b49cd24308a17579efd28a4e32 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 7f530a9b75de4d8e963edf94bf26d5ed 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 8da88e8125d94d79902657186ec64a1f 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 96bb669d137f43bca91c7092d58ab1d4 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 80c5525e11f84110bc2bfd753319d4ad 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 3eedf81e4d6249d88542d1b8c25c8835 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 9bab107a1fef479caea27fa1c3274ddc 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 e8894d2e4e434954b4ce3098e548ea7f 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 224f0d63c67b467487ad907e9190e186 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 d04cfd76d7f544fc95a8c2c48ae29180 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 812e787e1c5f4322bf2e8c889e948b11 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 852884d2a60e41a19280105fbd1f4214 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 b8bb2809b0ea41deaa2c6fd558741fdc 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 e24d509a924e47f78331bd35814d13bc 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 2de230d2953e4f74b862d8582ad49de3 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 69ad6736611f48b7bafe39bc356b4c19 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 f097ea00099145cc9674404d6c8d7de1 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 b3f038f88e58496990a8f4500ffcf2b1 4 drwx------ 6 kudu kudu 4096 May 16 11:05 ..
If I look inside one of the folders, I see this:
bash-4.2$ cd f097ea00099145cc9674404d6c8d7de1/ bash-4.2$ ls -alst total 65688 144 drwx------ 52 kudu kudu 143360 Jun 16 10:48 .. 65536 -rw------- 1 kudu kudu 67108864 Jun 16 10:48 wal-000000001 4 -rw------- 1 kudu kudu 24000000 Jun 16 10:48 index.000000000 4 drwx------ 2 kudu kudu 4096 Jun 16 10:48 . bash-4.2$
The WAL file is exactly 64MiB in size (seems like a default allocation), but the table is empty. Is there a way to configure this default allocation to be less?.
Thanks a lot.
Created 06-16-2017 11:19 AM
Yes, what you're observing is Kudu preallocating one 64 MB write-ahead log segment for each partition. The space will be filled once you start writing to the partition.
In Kudu 1.4 we dropped the segment size from 64 MB to 8 MB. If you'd like to make that change now, you can do so via the --log_segment_size_mb command line option. An alternative would be to disable preallocation via --log_async_preallocate_segments=false and/or --log_preallocate_segments=false, but that's not something we generally test so I would advise against it.