Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to create and store the avro files in hive table?

avatar

i was trying to create hive table for storing avro file and i have stored my avro shema(.avsc file),my avro file in single location.

could anyone help me to create the table in hive?

1 ACCEPTED SOLUTION

avatar
Master Guru

If you already have your Avro file and Avro schema, upload them to HDFS and use

CREATE EXTERNAL TABLE my_avro_tbl
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION '/user/...'
  TBLPROPERTIES ('avro.schema.url'='hdfs://name-node.fqdn:8020/user/.../schema.avsc');

If your Avro file already contains the schema in its header you can just say

CREATE EXTERNAL TABLE tbl-name(... declarations ...) STORED AS AVRO LOCATION '...';

without specifying the schema. I have been testing it last few days and can confirm that it works on HDP-2.4 (Hive-1.2) for all scalar types like string, int, float, double, boolean etc. If you are using some complex types (like union) it might not work.

View solution in original post

13 REPLIES 13

avatar
Rising Star

The issue I am having is that I have avro objects that don't have the schema in the header. When I try and access these objects by specifying the schema via the schema.url parameter in the TBLPROPERTIES I am unable to access the data. However, if the Avro object includes the schema, I have no problem.

I'm pretty sure that the Avro objects are ok as I can extract the data from them using Avro-tools and providing the same schema. So what I would be interesting in seeing is if someone can load a schema-less avro object into HDFS then get a Hive table to access it, by providing the schema file.

avatar
Contributor

@Predrag Minovic could you please let me know how did you created hive avro table from hive text table? did you select from text table and created avro table using CREATE AS SELECT?

avatar
Master Guru

Okay, please uplaod your files somewhere (one of your exisiting questions, or a new one), and I'll try to read them with Hive.

avatar

When you say declarations above, you mean actually defining the columns and datatypes? What is the syntax for that? I am writing hdfs files from NiFi so it stores the schema header in the Avro file. I want to leverage that to create the hive table but do not have any good example to do that. If you could elaboratethe declarations part with some examples of a few columns it would really help.