Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Byte Array length and length of Text differes if the length of string is greater than 9

Byte Array length and length of Text differes if the length of string is greater than 9

Explorer

Hi,

 

Please explain below thing.

 

Text text = new Text("sampletext");

System.out.println(" lenght from txt " + text.getLength()); -- Printed 10
System.out.println(" lenght from txt bytes " + text.getBytes().length);       ---------------- Printed 11 , why.?

 

Any string whose length has been greater than 9, the length of text and length of byte array differs by 1
 and if length greater than 19, differs by 2 and so on..

 

Can you explain why this behaviour.?

1 REPLY 1

Re: Byte Array length and length of Text differes if the length of string is greater than 9

Master Guru
The text serialisation also writes lengths before encoding the string into bytes. This is required to deserialise it back.

It helps if you take a look at the sources, if you are looking for such explanations. See how we serialise and deserialise Text objects at https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/... (serialise) and https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/... (deserialise)

Since you are looking at serialisation internals, while sorta unrelated to Writables, you may also be interested in this Avro Serialisation specification page: http://avro.apache.org/docs/current/spec.html#Encodings