- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Multiple methods to create AVRO based Hive table
- Labels:
-
Apache Hive
Created ‎06-12-2017 02:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We can create the same table using one of the below two queries:
I have seen that they both result in the same table.
so how do they differ? and if they differ, when do I use one over the other?
CREATE TABLE sample_table STORED AS AVRO TBLPROPERTIES('avro.schema.url' = '<some location>');
CREATE TABLE sample_table ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.url'='file:///tmp/schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
Created ‎06-12-2017 09:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.
file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
Created ‎06-12-2017 09:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.
file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
Created ‎06-13-2017 02:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
Then if I am to use one of the common SerDes, Avro in this case, I can get by with just
CREATE TABLE sample_table STORED AS AVRO TBLPROPERTIES('avro.schema.url' = '<some location>');
rather than use the longer format?
Created ‎06-13-2017 06:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep, that should work just fine. Longer format is useful if you want to use custom SerDes such as the OpenCSV serde, but the table created using STORED AS AVRO will be identical as the one created explicitly using the org.apache.hadoop.hive.serde2.avro.AvroSerDe.
Created ‎06-14-2017 02:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for the explanations!
