Support Questions

Find answers, ask questions, and share your expertise

best Compression technique?

avatar
Expert Contributor

Hi, I have tables in hive. some are text format, some are orc... what are the best compression methods to compress the text file so that i can still query on the table?

1 ACCEPTED SOLUTION
5 REPLIES 5

avatar

Text file : As text files are also included , please choose a compression algorithm which support splits. (gz is a big No)
Orc : Orc does a block level compression hence always splittable .

Overall : Please use ZLIB, SNAPPY, lzo splliatble for compression.


*orc.compress : high level compression (one of NONE, ZLIB, SNAPPY)

avatar

avatar
Expert Contributor
@kgautam

how about parquet?

avatar

ORC and parquet both columnar format data storage and are competitors in terms of support an development.

avatar
New Contributor

ORC is best option with in hive and Parquet is best option across Hadoop ecosystem.