I created an external Hive Table that refers to a corresponding HBase table (which has one column family "cf"):
create external table testdb.testtable ( rowkey String, hashvalue String, valuelist String)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:valueList,cf:hashValue")
TBLPROPERTIES ('hbase.table.name' = 'hbase_table')
Now I want to create HFiles from an existing Hive table (origin_table), which has more than one column (rokey (=HBase row key), hashvalue, valuelist).
When I run the following query to create HFiles from the Hive table, I always get an exception.
Query:
set hfile.family.path=/tmp/testtable/cf
set hive.hbase.generatehfiles=true
INSERT OVERWRITE TABLE testdb.testtable
SELECT k, valuelist, hashvalue from testdb.origin_table DISTRIBUTE BY k SORT BY k;
Exception:
java.io.IOException: Added a key not lexically larger than previous.
Current cell = 055:test_2018-08-28 09:09:31/cf:hashValue/1536343090127/Put/vlen=3/seqid=0,
lastCell = 055:test_2018-08-28 09:09:31/cf:valueList/1536343090127/Put/vlen=11417/seqid=0
This seems to occur because of the different column names (hashValue und valueList). There's no difference if I exchange the columns of the query (SELECT k, hashvalue, valuelist FROM ...), it also throws the exception!
Of course, when change this example into a Hive table with only one column (+key column), the INSERT command works, as there is no other column to read out for the HFile creation.
Question now: How can I create HFiles with this Hive-HBase integration, if there is more than one column (+key) to transfer?