Support Questions

pmj · ‎10-20-2017

Hi, I have tables in hive. some are text format, some are orc... what are the best compression methods to compress the text file so that i can still query on the table?

divakarreddy_a · ‎10-20-2017

@PJ

We are using ORC Snappy on production cluster and Here are few useful links on compression:

https://community.hortonworks.com/questions/4067/snappy-vs-zlib-pros-and-cons-for-each-compression.h...

https://community.hortonworks.com/articles/49252/performance-comparison-bw-orc-snappy-and-zlib-in-h....

View solution in original post

kgautam · ‎10-20-2017

Text file : As text files are also included , please choose a compression algorithm which support splits. (gz is a big No)
Orc : Orc does a block level compression hence always splittable .

Overall : Please use ZLIB, SNAPPY, lzo splliatble for compression.

*orc.compress : high level compression (one of NONE, ZLIB, SNAPPY)