Created on 10-20-2017 06:13 PM - edited 09-16-2022 05:25 AM
Hi, I have tables in hive. some are text format, some are orc... what are the best compression methods to compress the text file so that i can still query on the table?
Created 10-20-2017 06:39 PM
We are using ORC Snappy on production cluster and Here are few useful links on compression:
Created 10-20-2017 06:25 PM
Text file : As text files are also included , please choose a compression algorithm which support splits. (gz is a big No)
Orc : Orc does a block level compression hence always splittable .
Overall : Please use ZLIB, SNAPPY, lzo splliatble for compression.
*orc.compress : high level compression (one of NONE, ZLIB, SNAPPY)
Created 10-20-2017 06:39 PM
We are using ORC Snappy on production cluster and Here are few useful links on compression:
Created 10-20-2017 06:53 PM
how about parquet?
Created 10-21-2017 12:06 AM
ORC and parquet both columnar format data storage and are competitors in terms of support an development.
Created 10-21-2017 05:48 AM
ORC is best option with in hive and Parquet is best option across Hadoop ecosystem.