Created 04-22-2016 06:06 AM
I'm currently practicing some simple HIVE tasks, with my current task to delete from a HIVE table over an existing HBase table. The HIVE table works fine, and was defined as follows:
CREATE EXTERNAL TABLE Test(rowkey STRING, c1 STRING, c2 STRING, c3 STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f1:c1,f1:c2,f1:c3')
TBLPROPERTIES ('hbase.table.name' = 'Test', 'transactional' = 'true');
Reading from this table works fine and is fairly trivial. However, I get an error when I try to delete from this table. My HiveQL for deleting is as follows:
DELETE FROM Test WHERE rowkey = "key1";
The key key1
does exist in the table, and I can retrieve it using HBase shell or HIVE. I won't post the full stack trace since it is very long. However, some excerpts:
ERROR : Vertex failed, vertexName=Reducer 2, vertexId=vertex_1461213316053_0007_1_01, diagnostics=[Task failed, taskId=task_1461213316053_0007_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":null},"value":null}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":null},"value":null}
Caused by: java.io.IOException: HBase row key cannot be NULL
So for some reason it evaluates the row key as NULL, although no queries other than delete ever have it evaluating to NULL. Given this, I tried replacing =
with <=>
which handles NULLs in a more friendly manner. However, I got the same result. This may be a Tez error, it may be HIVE or it may be HBase. I suspect HIVE is sending NULLs through to HBase, but I'm not certain.
Does anybody have any idea how I can remedy this?
Created 04-22-2016 03:27 PM
I understand https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions that Hive ACID (which is necessary for DELETE commands) only works on bucketed ORC tables.
I would expect that even the INSERT wouldn't work when you use 'transactional'='true' without being compliant with the mentioned prerequsites
If you want to have SQL on HBase I would go for Apache Phoenix (https://phoenix.apache.org/)
Created 04-22-2016 03:27 PM
I understand https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions that Hive ACID (which is necessary for DELETE commands) only works on bucketed ORC tables.
I would expect that even the INSERT wouldn't work when you use 'transactional'='true' without being compliant with the mentioned prerequsites
If you want to have SQL on HBase I would go for Apache Phoenix (https://phoenix.apache.org/)