Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive HBase Handler Problem

Highlighted

Hive HBase Handler Problem

New Contributor

I am using the hbase-storage handler in hive to query a table in hbase that uses avro format as follows.

CREATE EXTERNAL TABLE HBaseAvroTable
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
    "hbase.columns.mapping" = ":key,avro:data",
    "avro.data.serialization.type" = "avro",
    "avro.data.avro.schema.url"="hdfs://localhost/avro/avro_schema.avsc")
TBLPROPERTIES (
    "hbase.table.name" = "avro_table",
    "hbase.mapred.output.outputtable" = "avro_table",
    "hbase.struct.autogenerate"="true");

When inserting data through the Java HBase API, I am serializing the record as follows.

        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<DataRecord> datumWriter = new SpecificDatumWriter<DataRecord>(record.getSchema());
        DataFileWriter<DataRecord> dataFileWriter = new DataFileWriter<DataRecord>(datumWriter);

        dataFileWriter.create(record.getSchema(), out);
        dataFileWriter.append(record);
        dataFileWriter.close();
        return out.toByteArray();

This will create a record plus the schema in hbase. A subsequent query in hive will properly desrialize the record and make it available in hive.

However when I am inserting a record through hive as follows, the schema itself is not persisted.

INSERT OVERWRITE TABLE 
SELECT 'avro-record-01',  named_struct( 'schema_name', schema_name, 
                              'table_name', table_name,
                              'version', version,
                              'process_date', process_date,
                              'metric_list', metric_list ) as d
FROM avro_hive;

While the record is written to hbase it seems that the avro serializer is using the Binary Encoder when seralizing the record.

Any help would be appreciated.

-martin

2 REPLIES 2

Re: Hive HBase Handler Problem

New Contributor

Some update on this issue. When using a custom schema retriever, the binary encoding is starting to work, however the following scenarios still have some problems.

  1. Insert record HBase - Works with Binary Encoding - Can be queried in hive.
  2. Insert record via Hive - Avro encoding is not correct.
hbase(main):057:0> scan 'AvroTable'
ROW                                     COLUMN+CELL                                                                                                       
 zz-hive-avro-01                        column=Metric:data, timestamp=1467321017841, value=default\x02table\x021\x022016-06-28 10:55:22\x02RecordCount\x041000\x03AverageVisitNum\x0438.715

When querying through hive the following excpetion occurs.

Caused by: org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorException: Error deserializing avro payload
    at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.deserializeStruct(AvroLazyObjectInspector.java:275)
    at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.getStructFieldData(AvroLazyObjectInspector.java:145)
    at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:117)
    at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:81)
    ... 18 more
Caused by: java.io.EOFException
    at org.apache.avro.io.BinaryDecoder$ByteArrayByteSource.readRaw(BinaryDecoder.java:944)
    at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:349)
    at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
    at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:272)
    at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:113)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:339)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
    at org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable.readFields(AvroGenericRecordWritable.java:115)
    at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.deserializeStruct(AvroLazyObjectInspector.java:273)


Re: Hive HBase Handler Problem

When we insert records via Hive, the encoding is not correct and same issue we had during purge module in our application and this was the work around in JRuby to identify records inserted using hive in INT datatype.

https://www.linkedin.com/pulse/jruby-code-purge-data-hbase-over-hive-table-mukesh-kumar?trk=mp-reade...