Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive/Avro: Schema evolution - attribute size change

Hive/Avro: Schema evolution - attribute size change

Rising Star

How to evolve the schema, when the size of the particular attribute changes?

  • V1 schema:

{ "name": "sid", "type": [ "null", { "type": "fixed", "name": "SID", "namespace": "com.int.datatype", "doc": "", "size": 64 }
], "doc": "", "default": null, "businessLogic": "" }

  • V2 schema:

{ "name": "sid", "type": [ "null", { "type": "fixed", "name": "SID", "namespace": "com.int.datatype", "doc": "", "size": 16 }
], "doc": "", "default": null, "businessLogic": "" }

By keeping the size as 64 (for the schema associated to the Hive table), we can query against the avro files corresponding to ***v1 schema version but not v2***


hive> select sid from avro where dt='2017-12-02' limit 2;
OK
rojbg4ccmpwz
ilknjyclhplm
Time taken: 0.103 seconds, Fetched: 2 row(s)
hive> select sid from avro where dt='2018-02-11' limit 2;
OK
Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found com.int.datatype.SID, expecting com.int.datatype.SID
Time taken: 0.106 seconds


By keeping the size as 16 (for the schema associated to the Hive table), we can query against the avro files corresponding to ***v2 schema version but not v1***


hive> select sid from avro where dt='2018-02-11' limit 2;
OK
238d4a8bb307
xpj6nicoaxfl
Time taken: 0.205 seconds, Fetched: 2 row(s)
hive> select sid from avro where dt='2017-12-02' limit 2;
OK
Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found com.int.datatype.SID, expecting com.int.datatype.SID
Time taken: 0.11 seconds

2 REPLIES 2

Re: Hive/Avro: Schema evolution - attribute size change

Long story Short.

Avro provides schema evolution compatibility check, Please follow

http://bytepadding.com/big-data/spark/avro/avro-schema-compatibility-test/

Few pointers,
1. Avro data is pure bytes with no schema info in it.
2. Reader and writer has to provide the schema at time of reading(Deserialization) and writing(Serialization)
3. If you evolve the schema in an uncompatible way, you can read partial Data.

Highlighted

Re: Hive/Avro: Schema evolution - attribute size change

Rising Star

The issue is that the writer schema uses a different size than the reader schema and that causes the compatibility issue. Is there any workaround this?