Support Questions

Aditya · ‎01-11-2017

Currently I was studying the compression codecs and found out the codec which compresses the most and the one which compresses the least also found out the codec which is the slowest among all the compression codecs available but I couldn't find out which is the fastest compression codec. In Tom White book only a reference is provided that LZO, LZ4 and SNAPPY is faster than GZIP there is no point which tells the fastest codec among the three. In Cloudera documentation also there is just an reference SNAPPY is faster than LZO but again it tells to do testing on data to find out the time taken by LZO and SNAPPY to compress and de-compress. On searching Google I found some documentation which claims that LZ4 is the fastest among the three and they did testing on some data, below is the location of the document. I am not sure about it as the authentication of the document cannot be verified. http://www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom210c So, can someone help me to identify which is the fastest compression codec between LZO, LZ4 and SNAPPY.

Jim Halfpenny · ‎01-12-2017

Hi,
The speed of the compression codec is only part of the story, you should also consider the support for the codec in different parts of the Hadoop stack. Gaining slightly faster compression at the expense of compatibility is probably not a good trade off. Snappy is supported by pretty much all of the stack for example, whereas LZ4 is not currently supported by Impala.

If in doubt I would stick with Snappy since it is a reasonably fast and splittable codec. If performance is an issue you're likely to find greater benefit focusing on other parts of the stack rather than data compression.

Regards,
Jim

Cloudera Community

Support Questions

LZO, LZ4, SNAPPY - which is the fastest compression codec