Created 06-12-2017 02:44 PM
We can create the same table using one of the below two queries:
I have seen that they both result in the same table.
so how do they differ? and if they differ, when do I use one over the other?
CREATE TABLE sample_table STORED AS AVRO TBLPROPERTIES('avro.schema.url' = '<some location>');
CREATE TABLE sample_table ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.url'='file:///tmp/schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
Created 06-12-2017 09:26 PM
They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.
file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
Created 06-12-2017 09:26 PM
They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.
file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
Created 06-13-2017 02:47 AM
Thank you!
Then if I am to use one of the common SerDes, Avro in this case, I can get by with just
CREATE TABLE sample_table STORED AS AVRO TBLPROPERTIES('avro.schema.url' = '<some location>');
rather than use the longer format?
Created 06-13-2017 06:40 PM
Yep, that should work just fine. Longer format is useful if you want to use custom SerDes such as the OpenCSV serde, but the table created using STORED AS AVRO will be identical as the one created explicitly using the org.apache.hadoop.hive.serde2.avro.AvroSerDe.
Created 06-14-2017 02:32 AM
Thank you so much for the explanations!