Support Questions
Find answers, ask questions, and share your expertise

Kudu Encoding for small string

New Contributor
In a kudu table creation,

For a very small string like 5 characters string , Is it appropriate to apply LZ4 compression to it.?
For strings, we can apply either plain encdoing or dicitonary encoding. 


Is storage is more important that scan read in your requirement , Then yes you can perfom compression on it because LZ4 is the best in my opinion interms of compression / performance when compared with snappy or zlib. 

whats the cardinality on the string column  ? low or high ? 


Master Collaborator

If the cardinality is not high I would vote for dictionary encoding without compression. The dict encoding will save you a lot of space. However I dont know the internals of Kudu, where is the limit, how much distinct value can be in the dictionary.