Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive can't decode avro-serialized data

Hive can't decode avro-serialized data

New Contributor

I have the following avro schema “user.avsc”:

 

 http://s16.postimg.org/xeiq0fb1h/user_avsc.png

 

I used kite sdk so basically, all the fields on the avro schema are mapped to HBase columns. So, I have and HBase table with 3 columns (b, c and d).

 

When I insert a record on my HBase table via CLI, the data is stored on my HBase columns. Then, I use Hive to visualize the data stored on my HBase table (via HBase handler). I can successfully visualize b and c columns (with "hbase.table.default.storage.type" = "binary”) but the column d is displayed as NULL (Hive returns NULL when can’t convert the data).

 

I think the issue here is the way data is encoded before it is stored on HBase. I read that the int, long and String types are encoded by kite (1)  but the other types are “avro-serialized” (2) with an especial encoding (variable-length zig-zag coding).

I did a scan on my table and for and input of 130 on my d column, the value stored is: “\x02\x84\x02”. This is coherent with the explanation given for union [null, int] coding on avro: index of null + index of int + int coding.

 

Int coding:

  • (130 on Zigzag-coding) = 260
  • (260 on binary) = 0000010 0000100
  • (260 on binary) + (Variable-length) = 10000100 00000010
  • Result of 3) on hex = 84 02

 

index of null + index of int + int coding = 00 + 02 + 84 02 = 00 02 84 02

 

So, the data stored has a meaning but I can’t visualize it correctly.

Am I doing something wrong? Where can I find something to help me solve this issue? Is that an kite problem or an hive problem?

 

Help would be appreciated. Thanks on advance.

 

(1) https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-hbase/src/main/java/org/kitesdk/dat...

(2) https://avro.apache.org/docs/1.7.7/spec.html#Data+Serialization