We have Hive over Hbase table and lets say there are few columns with INT datatype, data loaded from Hive. Now if we would like to delete data based on values present in that particular column(INT), is not possible. It is because values are converted to Binary, even HBase API filter(SingleColumnValueFilter) would return wrong result if we query that particular column values from HBase.
Problem to solve: How purge Hive INT datatype column data from HBase?
This is the first textual
series containing the resolution of above problem. Next series i'll create a small
video on running code and cover other datatypes too.
In such scenario we cant use
standard API and unable to apply filters on binary column values,
Solution :- Below JRuby program code.
So you have already heard many advantages of
storing data in HBase(specially binary block format) and create Hive table on
top of that to query your data. I am not going to explain use case for this,
why we required HBase over Hive but simple reason for batter
visibility/representation of data in tabular format.
I have come across this
problem few days back when we required to purge HBase data after completion of
retention period and we struck to delete data from HBase table using HBase
API's and filters when particular column/columns is of INT data type from Hive.
Below is sample use case:-
There are two type of storage
format when for Hive data in HBase:-
Storing data in Binary block
in HBase has its own advantages. Below script to create sample tables in both
Hbase and Hive:-