Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Seperation of hot and cold data -HBase


I have a table named 'X' and a column family 'cf. The table contains data of past 5 years. Old data are requested only few times whereas recent data are accessed frequently. I wanted to apply different storage policies for the data based on time. How can i configure ?


Also is it possible to specify different compression algorithms for hot and cold data in single column family? I am asking this because in HBase documentation, different algorithms are recommended for hot and cold data.


Super Collaborator

Hello @sachin_saju 


Thanks for using Cloudera Community. You have 2 ask in the Post:
1. How to configure different Storage Policies with Cold & Hot Data,

2. Applying different Compression Algorithm in 1 Column Family. 


For Q2, I believe the same isn't feasible i.e. Compression Algorithm can be set at CF level. Review [1] for the Compression Algorithm recommendation around Hot & Cold type data. 

For Q1, I assume you are referring to HDFS Storage Policy. If Yes, the same is configured uniformly i.e. I am not sure if we can apply different HDFS Storage Policy for different data within the same CF. In HBase, We generally recommend SSD [2] for WAL, else the HBase Data relies on HDFS Storage Policy used. Alternatively, Use BackUp-Restore [3] for having a "Cold" Version of Data, which can be restored as per requirement. 


Regards, Smarak







Hello @smdas 
Thanks for the response.


These links mention date tiered compaction policy in hbase. Does it somehow help in configuring different policy for same column family? or did i misunderstood?