I created table as below :
CREATE EXTERNAL TABLE testtbl
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
Then an user sent in a binary encoded data file through kafka/flume into /tmp/staging3 in hdfs.
So whatever was loaded in, when I tried to select I got :
hive> select * from testtbl;
Failed with exception java.io.IOException:java.io.IOException: Not a data file.
So questions are :
1. if the data is binary encoded, should we do anything special in the create table statement above?
2. Or is there anyway for me to check if the binary encoding is correct etc?
Appreciate the insights to troubleshoot this issue.
Hello @n c!Afaik you can use Avro with binary encoded since this binary encoded content has avro compatibility. To further details, take a look at this link:
You can give it a shot with this avro tools, to figure if your binary date coming from Kafka/Flume has avro schema embedded into the data.
Lastly, take a look at this link, it says which encoding Avro accepts as data.
Hope this helps