Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎02-12-2016

Problem serializing avro union null,int

Hi!

 

I have the following avro schema “user.avsc”:

 avro schema

 

 

I used kite sdk so basically, all the fields on the avro schema are mapped to HBase columns. So, I have and HBase table with 3 columns (b, c and d).

 

When I insert a record on my HBase table via CLI, the data is stored on my HBase columns. Then, I use Hive to visualize the data stored on my HBase table (via HBase handler). I can successfully visualize b and c columns (with "hbase.table.default.storage.type" = "binary”) but the column d is displayed as NULL (Hive returns NULL when can’t convert the data).

 

I think the issue here is the way data is encoded before it is stored on HBase. I read that the int, long and String types are encoded by kite (1)  but the other types are “avro-serialized” (2) with an especial encoding (variable-length zig-zag coding).

I did a scan on my table and for and input of 130 on my d column, the value stored is: “\x02\x84\x02”. This is coherent with the explanation given for union [null, int] coding on avro: index of null + index of int + int coding.

 

Int coding:

  • (130 on Zigzag-coding) = 260
  • (260 on binary) = 0000010 0000100
  • (260 on binary) + (Variable-length) = 10000100 00000010
  • Result of 3) on hex = 84 02

 

index of null + index of int + int coding = 00 + 02 + 84 02 = 00 02 84 02

 

So, the data stored has a meaning but I can’t visualize it correctly.

Am I doing something wrong? Where can I find something to help me solve this issue?Is that an kite problem or an hive problem?

 

Help would be appreciated. Thanks on advance.

 

(1) https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-hbase/src/main/java/org/kitesdk/dat...

(2) https://avro.apache.org/docs/1.7.7/spec.html#Data+Serialization

 

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.