Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive can't decode avro-serialized data


Hive can't decode avro-serialized data

New Contributor

I have the following avro schema “user.avsc”:


I used kite sdk so basically, all the fields on the avro schema are mapped to HBase columns. So, I have and HBase table with 3 columns (b, c and d).


When I insert a record on my HBase table via CLI, the data is stored on my HBase columns. Then, I use Hive to visualize the data stored on my HBase table (via HBase handler). I can successfully visualize b and c columns (with "" = "binary”) but the column d is displayed as NULL (Hive returns NULL when can’t convert the data).


I think the issue here is the way data is encoded before it is stored on HBase. I read that the int, long and String types are encoded by kite (1)  but the other types are “avro-serialized” (2) with an especial encoding (variable-length zig-zag coding).

I did a scan on my table and for and input of 130 on my d column, the value stored is: “\x02\x84\x02”. This is coherent with the explanation given for union [null, int] coding on avro: index of null + index of int + int coding.


Int coding:

  • (130 on Zigzag-coding) = 260
  • (260 on binary) = 0000010 0000100
  • (260 on binary) + (Variable-length) = 10000100 00000010
  • Result of 3) on hex = 84 02


index of null + index of int + int coding = 00 + 02 + 84 02 = 00 02 84 02


So, the data stored has a meaning but I can’t visualize it correctly.

Am I doing something wrong? Where can I find something to help me solve this issue? Is that an kite problem or an hive problem?


Help would be appreciated. Thanks on advance.