Support Questions

Find answers, ask questions, and share your expertise

Space occupied by Kudu partition

avatar
Explorer

Hi,

I've seen that when I create any empty partition in kudu, it occupies around 65MiB in disk. I have some cases with a huge number of partitions, and this space is eatting up the disk, for partitons that are empty!! Is there a way to change this 'default' space occupied by partition? How?

Thank you very much!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Yes, what you're observing is Kudu preallocating one 64 MB write-ahead log segment for each partition. The space will be filled once you start writing to the partition.

 

In Kudu 1.4 we dropped the segment size from 64 MB to 8 MB. If you'd like to make that change now, you can do so via the --log_segment_size_mb command line option. An alternative would be to disable preallocation via --log_async_preallocate_segments=false and/or --log_preallocate_segments=false, but that's not something we generally test so I would advise against it.

 

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

What version of Kudu are you using? How exactly did you measure the new partition's space consumption?

 

avatar
Explorer

First of all, thank you for your reply.

 

Kudu version:

 

 

$ kudu -version
kudu 1.3.0-cdh5.11.0
revision 4dcf4a9d516865d249f4cb9b07f93c67e84614ae
build type RELEASE
built by jenkins at 12 Apr 2017 14:02:23 PST on impala-ec2-pkg-centos-7-0cb2.vpc.cloudera.com
build id 2017-04-12_13-25-54

To see the space consumption, I first see how much used space I have in the FS:

 

/dev/mapper/vg_hadoop-lv_hadoop   99G  2.3G   92G   3% /opt/hadoop

 

Then I create a table using Impala with many partitions by range (50 for this example):

 

 

[bigdata04dev.cpd:21000] > CREATE TABLE vndr1.TEST_API_KUDU (
                         >   starttime BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   rnc STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   nodeb STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   cell STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   vendor STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   ne_version STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   oss_avail_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   download_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   load_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   gp INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   call_attempts INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   data_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   drop_calls INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   error_bits INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   rab_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   rrc_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   voice_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   used_power INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   packet_duration INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
                         >   PRIMARY KEY (starttime, rnc, nodeb, cell)
                         > )
                         > PARTITION BY RANGE (starttime)
                         > (
                         >   PARTITION VALUES < unix_timestamp('2016-05-22'),
                         >   PARTITION unix_timestamp('2016-05-22') <= VALUES < unix_timestamp('2016-05-23'),
                         >   PARTITION unix_timestamp('2016-05-23') <= VALUES < unix_timestamp('2016-05-24'),
                         >   PARTITION unix_timestamp('2016-05-24') <= VALUES < unix_timestamp('2016-05-25'),
                         >   PARTITION unix_timestamp('2016-05-25') <= VALUES < unix_timestamp('2016-05-26'),
                         >   PARTITION unix_timestamp('2016-05-26') <= VALUES < unix_timestamp('2016-05-27'),
                         >   PARTITION unix_timestamp('2016-05-27') <= VALUES < unix_timestamp('2016-05-28'),
                         >   PARTITION unix_timestamp('2016-05-28') <= VALUES < unix_timestamp('2016-05-29'),
                         >   PARTITION unix_timestamp('2016-05-29') <= VALUES < unix_timestamp('2016-05-30'),
                         >   PARTITION unix_timestamp('2016-05-30') <= VALUES < unix_timestamp('2016-05-31'),
                         >   PARTITION unix_timestamp('2016-05-31') <= VALUES < unix_timestamp('2016-06-01'),
                         >   PARTITION unix_timestamp('2016-06-01') <= VALUES < unix_timestamp('2016-06-02'),
                         >   PARTITION unix_timestamp('2016-06-02') <= VALUES < unix_timestamp('2016-06-03'),
                         >   PARTITION unix_timestamp('2016-06-03') <= VALUES < unix_timestamp('2016-06-04'),
                         >   PARTITION unix_timestamp('2016-06-04') <= VALUES < unix_timestamp('2016-06-05'),
                         >   PARTITION unix_timestamp('2016-06-05') <= VALUES < unix_timestamp('2016-06-06'),
                         >   PARTITION unix_timestamp('2016-06-06') <= VALUES < unix_timestamp('2016-06-07'),
                         >   PARTITION unix_timestamp('2016-06-07') <= VALUES < unix_timestamp('2016-06-08'),
                         >   PARTITION unix_timestamp('2016-06-08') <= VALUES < unix_timestamp('2016-06-09'),
                         >   PARTITION unix_timestamp('2016-06-09') <= VALUES < unix_timestamp('2016-06-10'),
                         >   PARTITION unix_timestamp('2016-06-10') <= VALUES < unix_timestamp('2016-06-11'),
                         >   PARTITION unix_timestamp('2016-06-11') <= VALUES < unix_timestamp('2016-06-12'),
                         >   PARTITION unix_timestamp('2016-06-12') <= VALUES < unix_timestamp('2016-06-13'),
                         >   PARTITION unix_timestamp('2016-06-13') <= VALUES < unix_timestamp('2016-06-14'),
                         >   PARTITION unix_timestamp('2016-06-14') <= VALUES < unix_timestamp('2016-06-15'),
                         >   PARTITION unix_timestamp('2016-06-15') <= VALUES < unix_timestamp('2016-06-16'),
                         >   PARTITION unix_timestamp('2016-06-16') <= VALUES < unix_timestamp('2016-06-17'),
                         >   PARTITION unix_timestamp('2016-06-17') <= VALUES < unix_timestamp('2016-06-18'),
                         >   PARTITION unix_timestamp('2016-06-18') <= VALUES < unix_timestamp('2016-06-19'),
                         >   PARTITION unix_timestamp('2016-06-19') <= VALUES < unix_timestamp('2016-06-20'),
                         >   PARTITION unix_timestamp('2016-06-20') <= VALUES < unix_timestamp('2016-06-21'),
                         >   PARTITION unix_timestamp('2016-06-21') <= VALUES < unix_timestamp('2016-06-22'),
                         >   PARTITION unix_timestamp('2016-06-22') <= VALUES < unix_timestamp('2016-06-23'),
                         >   PARTITION unix_timestamp('2016-06-23') <= VALUES < unix_timestamp('2016-06-24'),
                         >   PARTITION unix_timestamp('2016-06-24') <= VALUES < unix_timestamp('2016-06-25'),
                         >   PARTITION unix_timestamp('2016-06-25') <= VALUES < unix_timestamp('2016-06-26'),
                         >   PARTITION unix_timestamp('2016-06-26') <= VALUES < unix_timestamp('2016-06-27'),
                         >   PARTITION unix_timestamp('2016-06-27') <= VALUES < unix_timestamp('2016-06-28'),
                         >   PARTITION unix_timestamp('2016-06-28') <= VALUES < unix_timestamp('2016-06-29'),
                         >   PARTITION unix_timestamp('2016-06-29') <= VALUES < unix_timestamp('2016-06-30'),
                         >   PARTITION unix_timestamp('2016-06-30') <= VALUES < unix_timestamp('2016-07-01'),
                         >   PARTITION unix_timestamp('2016-07-01') <= VALUES < unix_timestamp('2016-07-02'),
                         >   PARTITION unix_timestamp('2016-07-02') <= VALUES < unix_timestamp('2016-07-03'),
                         >   PARTITION unix_timestamp('2016-07-03') <= VALUES < unix_timestamp('2016-07-04'),
                         >   PARTITION unix_timestamp('2016-07-04') <= VALUES < unix_timestamp('2016-07-05'),
                         >   PARTITION unix_timestamp('2016-07-05') <= VALUES < unix_timestamp('2016-07-06'),
                         >   PARTITION unix_timestamp('2016-07-06') <= VALUES < unix_timestamp('2016-07-07'),
                         >   PARTITION unix_timestamp('2016-07-07') <= VALUES < unix_timestamp('2016-07-08'),
                         >   PARTITION unix_timestamp('2016-07-08') <= VALUES < unix_timestamp('2016-07-09'),
                         >   PARTITION unix_timestamp('2016-07-09') <= VALUES < unix_timestamp('2016-07-10')
                         > )
                         > STORED AS KUDU
                         > TBLPROPERTIES ('kudu.master_addresses'='192.168.10.35', 'kudu.table_name'='vndr1.TEST_API_KUDU');
Query: create TABLE vndr1.TEST_API_KUDU (
  starttime BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  rnc STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  nodeb STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  cell STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  vendor STRING NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  ne_version STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  oss_avail_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  download_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  load_time STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  gp INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  call_attempts INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  data_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  drop_calls INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  error_bits INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  rab_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  rrc_fails INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  voice_traffic INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  used_power INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  packet_duration INT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
  PRIMARY KEY (starttime, rnc, nodeb, cell)
)
PARTITION BY RANGE (starttime)
(
  PARTITION VALUES < unix_timestamp('2016-05-22'),
  PARTITION unix_timestamp('2016-05-22') <= VALUES < unix_timestamp('2016-05-23'),
  PARTITION unix_timestamp('2016-05-23') <= VALUES < unix_timestamp('2016-05-24'),
  PARTITION unix_timestamp('2016-05-24') <= VALUES < unix_timestamp('2016-05-25'),
  PARTITION unix_timestamp('2016-05-25') <= VALUES < unix_timestamp('2016-05-26'),
  PARTITION unix_timestamp('2016-05-26') <= VALUES < unix_timestamp('2016-05-27'),
  PARTITION unix_timestamp('2016-05-27') <= VALUES < unix_timestamp('2016-05-28'),
  PARTITION unix_timestamp('2016-05-28') <= VALUES < unix_timestamp('2016-05-29'),
  PARTITION unix_timestamp('2016-05-29') <= VALUES < unix_timestamp('2016-05-30'),
  PARTITION unix_timestamp('2016-05-30') <= VALUES < unix_timestamp('2016-05-31'),
  PARTITION unix_timestamp('2016-05-31') <= VALUES < unix_timestamp('2016-06-01'),
  PARTITION unix_timestamp('2016-06-01') <= VALUES < unix_timestamp('2016-06-02'),
  PARTITION unix_timestamp('2016-06-02') <= VALUES < unix_timestamp('2016-06-03'),
  PARTITION unix_timestamp('2016-06-03') <= VALUES < unix_timestamp('2016-06-04'),
  PARTITION unix_timestamp('2016-06-04') <= VALUES < unix_timestamp('2016-06-05'),
  PARTITION unix_timestamp('2016-06-05') <= VALUES < unix_timestamp('2016-06-06'),
  PARTITION unix_timestamp('2016-06-06') <= VALUES < unix_timestamp('2016-06-07'),
  PARTITION unix_timestamp('2016-06-07') <= VALUES < unix_timestamp('2016-06-08'),
  PARTITION unix_timestamp('2016-06-08') <= VALUES < unix_timestamp('2016-06-09'),
  PARTITION unix_timestamp('2016-06-09') <= VALUES < unix_timestamp('2016-06-10'),
  PARTITION unix_timestamp('2016-06-10') <= VALUES < unix_timestamp('2016-06-11'),
  PARTITION unix_timestamp('2016-06-11') <= VALUES < unix_timestamp('2016-06-12'),
  PARTITION unix_timestamp('2016-06-12') <= VALUES < unix_timestamp('2016-06-13'),
  PARTITION unix_timestamp('2016-06-13') <= VALUES < unix_timestamp('2016-06-14'),
  PARTITION unix_timestamp('2016-06-14') <= VALUES < unix_timestamp('2016-06-15'),
  PARTITION unix_timestamp('2016-06-15') <= VALUES < unix_timestamp('2016-06-16'),
  PARTITION unix_timestamp('2016-06-16') <= VALUES < unix_timestamp('2016-06-17'),
  PARTITION unix_timestamp('2016-06-17') <= VALUES < unix_timestamp('2016-06-18'),
  PARTITION unix_timestamp('2016-06-18') <= VALUES < unix_timestamp('2016-06-19'),
  PARTITION unix_timestamp('2016-06-19') <= VALUES < unix_timestamp('2016-06-20'),
  PARTITION unix_timestamp('2016-06-20') <= VALUES < unix_timestamp('2016-06-21'),
  PARTITION unix_timestamp('2016-06-21') <= VALUES < unix_timestamp('2016-06-22'),
  PARTITION unix_timestamp('2016-06-22') <= VALUES < unix_timestamp('2016-06-23'),
  PARTITION unix_timestamp('2016-06-23') <= VALUES < unix_timestamp('2016-06-24'),
  PARTITION unix_timestamp('2016-06-24') <= VALUES < unix_timestamp('2016-06-25'),
  PARTITION unix_timestamp('2016-06-25') <= VALUES < unix_timestamp('2016-06-26'),
  PARTITION unix_timestamp('2016-06-26') <= VALUES < unix_timestamp('2016-06-27'),
  PARTITION unix_timestamp('2016-06-27') <= VALUES < unix_timestamp('2016-06-28'),
  PARTITION unix_timestamp('2016-06-28') <= VALUES < unix_timestamp('2016-06-29'),
  PARTITION unix_timestamp('2016-06-29') <= VALUES < unix_timestamp('2016-06-30'),
  PARTITION unix_timestamp('2016-06-30') <= VALUES < unix_timestamp('2016-07-01'),
  PARTITION unix_timestamp('2016-07-01') <= VALUES < unix_timestamp('2016-07-02'),
  PARTITION unix_timestamp('2016-07-02') <= VALUES < unix_timestamp('2016-07-03'),
  PARTITION unix_timestamp('2016-07-03') <= VALUES < unix_timestamp('2016-07-04'),
  PARTITION unix_timestamp('2016-07-04') <= VALUES < unix_timestamp('2016-07-05'),
  PARTITION unix_timestamp('2016-07-05') <= VALUES < unix_timestamp('2016-07-06'),
  PARTITION unix_timestamp('2016-07-06') <= VALUES < unix_timestamp('2016-07-07'),
  PARTITION unix_timestamp('2016-07-07') <= VALUES < unix_timestamp('2016-07-08'),
  PARTITION unix_timestamp('2016-07-08') <= VALUES < unix_timestamp('2016-07-09'),
  PARTITION unix_timestamp('2016-07-09') <= VALUES < unix_timestamp('2016-07-10')
)
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='192.168.10.35', 'kudu.table_name'='vndr1.TEST_API_KUDU')

Fetched 0 row(s) in 0.80s
[bigdata04dev.cpd:21000] >

And then I see the used space again for this FS:

 

/dev/mapper/vg_hadoop-lv_hadoop   99G  5.4G   89G   6% /opt/hadoop

 

As you can see, it went from 2.3G to 5.4G with just 50 range partitions. That is around 63.5M each partition, and the table is empty.

 

Thank you very much

 

avatar
Expert Contributor

It would be interesting (and useful) to see what new files were created as part of the CREATE TABLE. I suspect the increase in consumption was due to new WAL segments. We create one per partition and in Kudu 1.3 I believe we preallocate it to 32M.

 

Can you confirm this? Find Kudu's configured WAL directory and look at its contents before and after the CREATE TABLE.

avatar
Explorer

Hi,

 

The WAL directory is empty (there are no tables in Kudu right now):

 

bash-4.2$ ls -last
total 148
144 drwx------ 2 kudu kudu 143360 Jun 16 10:45 .
  4 drwx------ 6 kudu kudu   4096 May 16 11:05 ..
bash-4.2$ pwd
/opt/hadoop/kudu/tserver/wals

And when I create the table (using the same script as before) the content of the folder is this:

 

bash-4.2$ pwd
/opt/hadoop/kudu/tserver/wals
bash-4.2$ ls -last
total 348
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 44738a508974486089d1af0b6fa07caa
144 drwx------ 52 kudu kudu 143360 Jun 16 10:48 .
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 5fb3fe9397e94ec0a042a152de68c49f
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 cd102f09427e4b61bd5f799a409371a0
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 5a68aac0f8e1452aad66bc23f1632997
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 3ff60b2b885b4db89adf8acba0c0f9bb
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 87f8a8fe81c9436d86439a827ed2abd9
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 531d52acc8fb467b9deb2608b60eb028
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 ade15170415f442dba2ac697eb882fbe
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 6a68fab92bb64f57a5dd9d521995cee9
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 5894dbdb7a7748de92e0d65102948f58
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 1cfa68a85b964d2ba41b64ac8b9523d3
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 2cef1717b4c646dfa93c7962438bd2e5
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 067a8f2b467b45af94225a099e18b4c8
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 c3f18cd8a1564e56a3cab5e7502ca5bb
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 03a8eb79664841509380078ddee7b335
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 dc57712b1d324567bcd9a7fd22e850b7
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 ec4fde3b8e2247c6ab2e0f54b0ec85fb
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 91d019cf106544a597e5f4e59e07a3db
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 352b0f986534489b9e6aab9e3bd9cbbd
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 ca21b5be58f8440c9be81da103befe2a
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 1bb6c183bdeb4acf849f772996c4e8bb
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 ec577d25518745dfa83c81301f0835f3
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 7108c9e5a6fa4a9abe2d1cff3488b34f
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 4fe6fddfe52842fe9dd73929924f48ef
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 e7c3f2130dfd49d3abc61949faaa22d1
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 60509a5c83ef4aeca27acb9803efad34
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 6b6d0c99a47847ee85ec47a25adbd562
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 8e16c9d9dc0d4b0ba7c766d0cd52bc0d
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 5bb9d62f9d6e41f3bb8561699d7e2ec7
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 ae880022adbf4e15b33d040f697dd41b
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 1505fda0c54b4d6985f972adee2653e5
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 731ab85af7d147d2bb073a303afbcd3a
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 5e0ed0b49cd24308a17579efd28a4e32
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 7f530a9b75de4d8e963edf94bf26d5ed
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 8da88e8125d94d79902657186ec64a1f
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 96bb669d137f43bca91c7092d58ab1d4
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 80c5525e11f84110bc2bfd753319d4ad
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 3eedf81e4d6249d88542d1b8c25c8835
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 9bab107a1fef479caea27fa1c3274ddc
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 e8894d2e4e434954b4ce3098e548ea7f
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 224f0d63c67b467487ad907e9190e186
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 d04cfd76d7f544fc95a8c2c48ae29180
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 812e787e1c5f4322bf2e8c889e948b11
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 852884d2a60e41a19280105fbd1f4214
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 b8bb2809b0ea41deaa2c6fd558741fdc
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 e24d509a924e47f78331bd35814d13bc
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 2de230d2953e4f74b862d8582ad49de3
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 69ad6736611f48b7bafe39bc356b4c19
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 f097ea00099145cc9674404d6c8d7de1
  4 drwx------  2 kudu kudu   4096 Jun 16 10:48 b3f038f88e58496990a8f4500ffcf2b1
  4 drwx------  6 kudu kudu   4096 May 16 11:05 ..

If I look inside one of the folders, I see this:

 

bash-4.2$ cd f097ea00099145cc9674404d6c8d7de1/
bash-4.2$ ls -alst
total 65688
  144 drwx------ 52 kudu kudu   143360 Jun 16 10:48 ..
65536 -rw-------  1 kudu kudu 67108864 Jun 16 10:48 wal-000000001
    4 -rw-------  1 kudu kudu 24000000 Jun 16 10:48 index.000000000
    4 drwx------  2 kudu kudu     4096 Jun 16 10:48 .
bash-4.2$ 

The WAL file is exactly 64MiB in size (seems like a default allocation), but the table is empty. Is there a way to configure this default allocation to be less?.

 

Thanks a lot.

avatar
Expert Contributor

Yes, what you're observing is Kudu preallocating one 64 MB write-ahead log segment for each partition. The space will be filled once you start writing to the partition.

 

In Kudu 1.4 we dropped the segment size from 64 MB to 8 MB. If you'd like to make that change now, you can do so via the --log_segment_size_mb command line option. An alternative would be to disable preallocation via --log_async_preallocate_segments=false and/or --log_preallocate_segments=false, but that's not something we generally test so I would advise against it.