Support Questions
Find answers, ask questions, and share your expertise

Hive-HBase Integration: Not possible to create HFiles for more than one Row

Hive-HBase Integration: Not possible to create HFiles for more than one Row

Expert Contributor

I created an external Hive Table that refers to a corresponding HBase table (which has one column family "cf"):

create external table testdb.testtable ( rowkey String,  hashvalue String,  valuelist String)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:valueList,cf:hashValue")
TBLPROPERTIES ('' = 'hbase_table')

Now I want to create HFiles from an existing Hive table (origin_table), which has more than one column (rokey (=HBase row key), hashvalue, valuelist).

When I run the following query to create HFiles from the Hive table, I always get an exception.


set hive.hbase.generatehfiles=true

INSERT OVERWRITE TABLE testdb.testtable 
SELECT k, valuelist, hashvalue from testdb.origin_table DISTRIBUTE BY k SORT BY k;

Exception: Added a key not lexically larger than previous. 
Current cell = 055:test_2018-08-28 09:09:31/cf:hashValue/1536343090127/Put/vlen=3/seqid=0, 
    lastCell = 055:test_2018-08-28 09:09:31/cf:valueList/1536343090127/Put/vlen=11417/seqid=0

This seems to occur because of the different column names (hashValue und valueList). There's no difference if I exchange the columns of the query (SELECT k, hashvalue, valuelist FROM ...), it also throws the exception!

Of course, when change this example into a Hive table with only one column (+key column), the INSERT command works, as there is no other column to read out for the HFile creation.

Question now: How can I create HFiles with this Hive-HBase integration, if there is more than one column (+key) to transfer?