Registered: ‎06-30-2018

Kudu Encoding for small string

In a kudu table creation,

For a very small string like 5 characters string , Is it appropriate to apply LZ4 compression to it.?
For strings, we can apply either plain encdoing or dicitonary encoding. 
Registered: ‎05-16-2016

Re: Kudu Encoding for small string

[ Edited ]

Is storage is more important that scan read in your requirement , Then yes you can perfom compression on it because LZ4 is the best in my opinion interms of compression / performance when compared with snappy or zlib. 

whats the cardinality on the string column  ? low or high ? 


Registered: ‎07-01-2015

Re: Kudu Encoding for small string

If the cardinality is not high I would vote for dictionary encoding without compression. The dict encoding will save you a lot of space. However I dont know the internals of Kudu, where is the limit, how much distinct value can be in the dictionary.