Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to create and store the avro files in hive table?

Solved Go to solution

how to create and store the avro files in hive table?

New Contributor

i was trying to create hive table for storing avro file and i have stored my avro shema(.avsc file),my avro file in single location.

could anyone help me to create the table in hive?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: how to create and store the avro files in hive table?

If you already have your Avro file and Avro schema, upload them to HDFS and use

CREATE EXTERNAL TABLE my_avro_tbl
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION '/user/...'
  TBLPROPERTIES ('avro.schema.url'='hdfs://name-node.fqdn:8020/user/.../schema.avsc');

If your Avro file already contains the schema in its header you can just say

CREATE EXTERNAL TABLE tbl-name(... declarations ...) STORED AS AVRO LOCATION '...';

without specifying the schema. I have been testing it last few days and can confirm that it works on HDP-2.4 (Hive-1.2) for all scalar types like string, int, float, double, boolean etc. If you are using some complex types (like union) it might not work.

13 REPLIES 13

Re: how to create and store the avro files in hive table?

Hi @manikandan ayyasamy,

Have you looked to the Hive doc on AvroSerDe : https://cwiki.apache.org/confluence/display/Hive/AvroSerDe ? you have examples on how to do it following your Hive version.

Please try to implement it and ask the community if you have a particular problem with your approach.

Note also that ORC can give the best performance with cost base optimization and Vectorization. Give it a try if this is a viable option.

Re: how to create and store the avro files in hive table?

New Contributor

i have tried the following,but i am not able to create the table.

create external table tweets(name string,time int,tweet string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.AvroSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

TBLPROPERTIES ( 'avro.schema.url'='http://trvlhaddv2gw1.tsh.thomson.com:8888/filebrowser/view/user/c400351/tweet.avsc');

i did not get any error messages.

Re: how to create and store the avro files in hive table?

What's the problem you are facing If you didn't get any error messages ? can you see the table tweets when you do :

show tables;

Re: how to create and store the avro files in hive table?

@manikandan ayyasamy

Ok, from reading your several posts and comments I think that Hive is not able to access your schema on HDFS. I see that you are trying to give the schema that's stored on HDFS. However, you are using http and port 8888 which is not good. Can you try with this modification

TBLPROPERTIES ('avro.schema.url'='hdfs://trvlhaddv2gw1.tsh.thomson.com:8020/PATH_TO_YOUR_AVSC_FILE');

Re: how to create and store the avro files in hive table?

New Contributor

Hi Mani,

Below are my observations

1. specifying schema in your DDL and also the Avro schema. i.e column names as part of DDL and Avro URL. use only one.

2. Is the schema in provided URL accessible from HDFS? you can move it HDFS and test

3. The serde is pointing to hcatalog, try using org.apache.hadoop.hive.serde2.avro.AvroSerDe

Re: how to create and store the avro files in hive table?

You are missing the "LOCATION" for your external table. Uplaod your Avro file somewhere in HDFS and provide the directory where your Avro file is located as your LOCATION. There is also no need to declare name, time etc., they are given in your avsc file. See my answer. After that try "SELECT * FROM tweets LIMIT 10;".

Highlighted

Re: how to create and store the avro files in hive table?

If you already have your Avro file and Avro schema, upload them to HDFS and use

CREATE EXTERNAL TABLE my_avro_tbl
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION '/user/...'
  TBLPROPERTIES ('avro.schema.url'='hdfs://name-node.fqdn:8020/user/.../schema.avsc');

If your Avro file already contains the schema in its header you can just say

CREATE EXTERNAL TABLE tbl-name(... declarations ...) STORED AS AVRO LOCATION '...';

without specifying the schema. I have been testing it last few days and can confirm that it works on HDP-2.4 (Hive-1.2) for all scalar types like string, int, float, double, boolean etc. If you are using some complex types (like union) it might not work.

Re: how to create and store the avro files in hive table?

Contributor

@Predrag Minovic So I've been battling with Avro and Hive for days and am getting nowhere. See thread here and here. If you can shed any light, it would be greatly appreciated.

Re: how to create and store the avro files in hive table?

Mark, here is what I did: Having a csv file as an input I created a Hive text table, loaded csv file onto the text table, created an Avro table "STORED AS AVRO", and inserted all recored from the text table into the avro table. I also tried to create an Avro table using a schema file. Then I tested external tables using avro table data and scheme file. I'll create an article and post this week.

Don't have an account?
Coming from Hortonworks? Activate your account here