I have a table which collects raw data from server and stores in mysql. i want to move to hbase. i moved using sqoop from mysql to hbase.
my work is to write impala query for calculating percentile function(95th and 97th ).
I sqoop mysql to hbase . create external table in hive . one column which is of decimal (32,1) is having null values.
create external table nrd_app_spt.cap(hbid string,MATRIX_col string,MATRIX_VALUE_col double,COUNTRY_col string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,RAW:MATRIX,RAW:MATRIX_VAL,RAW:COUNTRY")TBLPROPERTIES("hbase.table.name" = "nrd_app_spt:capacity_new");
I have a raw data in MySQL table with 5 columns. hostname string, timestamp string,country string, collec string, matrixvalue decimal(32,1). I sqoop and moved to HBase.
Doubt 1: which column is to set row key in hbase.?all column values are not unique. same hostname same date and time same country same collec i will get a matrix value. so there is no uniqueness. so i kept hostname and timestamp as row key.
Is that a good approach?
any other suggestion?
If i want to write a query then 2 columns will be as row key how to do it ?
Fetching null values for decimal (32,1)
I need to calculate percentile (95 and 97 ) for this column of data type decimal(32,1). is that possible in impala or hive? percentile function for all rows .
any udf function need to write or any built in function available?