In this article they just create a new Hive table that is linked with a HBase table. That's also working fine for me:
create table mydb.hbase_table(rowkey String, hashvalue int, valuelist String)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:hashValue,cf:valueList");
In the next step I need to create the HFiles by running the following query:
INSERT OVERWRITE TABLE mydb.hbase_table SELECT concat_ws("_", tester, counter) as key, hashvalue, valuelist from (select * from mydb.orc_table limit 1000000) a ORDER BY key, hashvalue;
In the example that's everything that is to do. But when I run this query the Tez task is running into the known "Added a key not lexically larger than previous [...]":
Added a key not lexically larger than previous. Current cell = tester1_1/cf:hashValue/1533284948384/Put/vlen=3/seqid=0, lastCell = tester1_1/cf:valueList/1533284948384/Put/vlen=15231/seqid=0
I know, that this exception occurs because Tez tried to write the value of column "hashValue" after writing the "valueList" column value. As the data for HFile must be ordered by <Key>/<CF>:<CQ> I somehow must be able to write the hashValue always before writing the valueList for a row (as hashValue is lexically smaller than valueList).
But how to do this? Why is the example working? Or another idea: Is it somehow possible to split the "hashValue" and "valueList" columns into two different Tez tasks (also not working for me for now, and it wouldn't be a good solution)?