Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Kudu Encoding for small string

Kudu Encoding for small string

New Contributor
In a kudu table creation,

For a very small string like 5 characters string , Is it appropriate to apply LZ4 compression to it.?
 
For strings, we can apply either plain encdoing or dicitonary encoding. 
2 REPLIES 2

Re: Kudu Encoding for small string

Champion

Is storage is more important that scan read in your requirement , Then yes you can perfom compression on it because LZ4 is the best in my opinion interms of compression / performance when compared with snappy or zlib. 

whats the cardinality on the string column  ? low or high ? 

 

Re: Kudu Encoding for small string

Master Collaborator

If the cardinality is not high I would vote for dictionary encoding without compression. The dict encoding will save you a lot of space. However I dont know the internals of Kudu, where is the limit, how much distinct value can be in the dictionary.

 

Don't have an account?
Coming from Hortonworks? Activate your account here