Created 05-10-2016 07:51 AM
i was trying to create hive table for storing avro file and i have stored my avro shema(.avsc file),my avro file in single location.
could anyone help me to create the table in hive?
Created 05-10-2016 11:08 AM
If you already have your Avro file and Avro schema, upload them to HDFS and use
CREATE EXTERNAL TABLE my_avro_tbl ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/...' TBLPROPERTIES ('avro.schema.url'='hdfs://name-node.fqdn:8020/user/.../schema.avsc');
If your Avro file already contains the schema in its header you can just say
CREATE EXTERNAL TABLE tbl-name(... declarations ...) STORED AS AVRO LOCATION '...';
without specifying the schema. I have been testing it last few days and can confirm that it works on HDP-2.4 (Hive-1.2) for all scalar types like string, int, float, double, boolean etc. If you are using some complex types (like union) it might not work.
Created 05-10-2016 09:24 AM
Have you looked to the Hive doc on AvroSerDe : https://cwiki.apache.org/confluence/display/Hive/AvroSerDe ? you have examples on how to do it following your Hive version.
Please try to implement it and ask the community if you have a particular problem with your approach.
Note also that ORC can give the best performance with cost base optimization and Vectorization. Give it a try if this is a viable option.
Created 05-10-2016 01:49 PM
i have tried the following,but i am not able to create the table.
create external table tweets(name string,time int,tweet string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ( 'avro.schema.url'='http://trvlhaddv2gw1.tsh.thomson.com:8888/filebrowser/view/user/c400351/tweet.avsc');
i did not get any error messages.
Created 05-10-2016 01:56 PM
What's the problem you are facing If you didn't get any error messages ? can you see the table tweets when you do :
show tables;
Created 05-10-2016 04:26 PM
Ok, from reading your several posts and comments I think that Hive is not able to access your schema on HDFS. I see that you are trying to give the schema that's stored on HDFS. However, you are using http and port 8888 which is not good. Can you try with this modification
TBLPROPERTIES ('avro.schema.url'='hdfs://trvlhaddv2gw1.tsh.thomson.com:8020/PATH_TO_YOUR_AVSC_FILE');
Created 10-20-2016 05:11 PM
Hi Mani,
Below are my observations
1. specifying schema in your DDL and also the Avro schema. i.e column names as part of DDL and Avro URL. use only one.
2. Is the schema in provided URL accessible from HDFS? you can move it HDFS and test
3. The serde is pointing to hcatalog, try using org.apache.hadoop.hive.serde2.avro.AvroSerDe
Created 05-10-2016 02:23 PM
You are missing the "LOCATION" for your external table. Uplaod your Avro file somewhere in HDFS and provide the directory where your Avro file is located as your LOCATION. There is also no need to declare name, time etc., they are given in your avsc file. See my answer. After that try "SELECT * FROM tweets LIMIT 10;".
Created 05-10-2016 11:08 AM
If you already have your Avro file and Avro schema, upload them to HDFS and use
CREATE EXTERNAL TABLE my_avro_tbl ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/...' TBLPROPERTIES ('avro.schema.url'='hdfs://name-node.fqdn:8020/user/.../schema.avsc');
If your Avro file already contains the schema in its header you can just say
CREATE EXTERNAL TABLE tbl-name(... declarations ...) STORED AS AVRO LOCATION '...';
without specifying the schema. I have been testing it last few days and can confirm that it works on HDP-2.4 (Hive-1.2) for all scalar types like string, int, float, double, boolean etc. If you are using some complex types (like union) it might not work.
Created 05-10-2016 02:37 PM
@Predrag Minovic So I've been battling with Avro and Hive for days and am getting nowhere. See thread here and here. If you can shed any light, it would be greatly appreciated.
Created 05-10-2016 02:48 PM
Mark, here is what I did: Having a csv file as an input I created a Hive text table, loaded csv file onto the text table, created an Avro table "STORED AS AVRO", and inserted all recored from the text table into the avro table. I also tried to create an Avro table using a schema file. Then I tested external tables using avro table data and scheme file. I'll create an article and post this week.