Reply
Contributor
Posts: 55
Registered: ‎09-17-2013

Byte Array length and length of Text differes if the length of string is greater than 9

Hi,

 

Please explain below thing.

 

Text text = new Text("sampletext");

System.out.println(" lenght from txt " + text.getLength()); -- Printed 10
System.out.println(" lenght from txt bytes " + text.getBytes().length);       ---------------- Printed 11 , why.?

 

Any string whose length has been greater than 9, the length of text and length of byte array differs by 1
 and if length greater than 19, differs by 2 and so on..

 

Can you explain why this behaviour.?

Posts: 1,885
Kudos: 422
Solutions: 298
Registered: ‎07-31-2013

Re: Byte Array length and length of Text differes if the length of string is greater than 9

The text serialisation also writes lengths before encoding the string into bytes. This is required to deserialise it back.

It helps if you take a look at the sources, if you are looking for such explanations. See how we serialise and deserialise Text objects at https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/... (serialise) and https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/... (deserialise)

Since you are looking at serialisation internals, while sorta unrelated to Writables, you may also be interested in this Avro Serialisation specification page: http://avro.apache.org/docs/current/spec.html#Encodings
Announcements