Created 03-20-2016 08:20 AM
I have a table that is the outpout of an entire data model. I load new data to it each day, and each day is a partition.
I call the partition 'load_dt', so '2016-01-01' is a partition that is a copy of the data at the end of the day on 2016-01-01, load_dt='2016-01-02' is a copy of the data model as of '2016-01-02', and so on. Each partition is stored as parquet file format with snappy compression.
After a certain amount of time I am far less likely to need fast access to the data, so I am considering re-loading old partitions but as textfile with bz2 compression so that they take up less space. So more recent partitions are partquet/snappy and in the same table older partitions are text/bz2.
Other than slower performance if i'm using the older, more highly compressed partitions, are there any other issues I should expect to run into when I try this? Or, if it's not a good idea to mix file/compression in the same table can someone suggest a best-practice for archiving? Thanks!