Support Questions
Find answers, ask questions, and share your expertise

Multiple methods to create AVRO based Hive table

We can create the same table using one of the below two queries:

I have seen that they both result in the same table.

so how do they differ? and if they differ, when do I use one over the other?

CREATE TABLE sample_table
STORED AS AVRO
TBLPROPERTIES('avro.schema.url' = '<some location>');
CREATE TABLE sample_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.url'='file:///tmp/schema.avsc')
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
1 ACCEPTED SOLUTION

Accepted Solutions

Rising Star

They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

View solution in original post

4 REPLIES 4

Rising Star

They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

View solution in original post

Thank you!

Then if I am to use one of the common SerDes, Avro in this case, I can get by with just

CREATE TABLE sample_table
STORED AS AVRO
TBLPROPERTIES('avro.schema.url' = '<some location>');

rather than use the longer format?

Rising Star

Yep, that should work just fine. Longer format is useful if you want to use custom SerDes such as the OpenCSV serde, but the table created using STORED AS AVRO will be identical as the one created explicitly using the org.apache.hadoop.hive.serde2.avro.AvroSerDe.

Thank you so much for the explanations!