Support Questions
Find answers, ask questions, and share your expertise

Loading binary encoded data into hive avro table

Explorer

I created table as below :

CREATE EXTERNAL TABLE testtbl

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

LOCATION '/tmp/staging3'

TBLPROPERTIES ('avro.schema.url'='hdfs:///tmp/avroschemas/testtbl.json');

Then an user sent in a binary encoded data file through kafka/flume into /tmp/staging3 in hdfs.

So whatever was loaded in, when I tried to select I got :

hive> select * from testtbl;

OK Failed with exception java.io.IOException:java.io.IOException: Not a data file.

So questions are :

1. if the data is binary encoded, should we do anything special in the create table statement above?

2. Or is there anyway for me to check if the binary encoding is correct etc?

Appreciate the insights to troubleshoot this issue.

1 REPLY 1

Re: Loading binary encoded data into hive avro table

Hello @n c!
Afaik you can use Avro with binary encoded since this binary encoded content has avro compatibility.
To further details, take a look at this link:

https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/

You can give it a shot with this avro tools, to figure if your binary date coming from Kafka/Flume has avro schema embedded into the data.

Lastly, take a look at this link, it says which encoding Avro accepts as data.

https://avro.apache.org/docs/1.8.1/spec.html#Encodings

Hope this helps