Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multiple methods to create AVRO based Hive table

Solved Go to solution

Multiple methods to create AVRO based Hive table

New Contributor

We can create the same table using one of the below two queries:

I have seen that they both result in the same table.

so how do they differ? and if they differ, when do I use one over the other?

CREATE TABLE sample_table
STORED AS AVRO
TBLPROPERTIES('avro.schema.url' = '<some location>');
CREATE TABLE sample_table
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.url'='file:///tmp/schema.avsc')
STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat';
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Multiple methods to create AVRO based Hive table

Rising Star

They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
4 REPLIES 4

Re: Multiple methods to create AVRO based Hive table

Rising Star

They are the same. Hive has a few shortcuts for common SerDes and Avro is one of them. You can use one of the following, or specify your own INPUTFORMAT and OUTPUTFORMAT classes.

file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

Re: Multiple methods to create AVRO based Hive table

New Contributor

Thank you!

Then if I am to use one of the common SerDes, Avro in this case, I can get by with just

CREATE TABLE sample_table
STORED AS AVRO
TBLPROPERTIES('avro.schema.url' = '<some location>');

rather than use the longer format?

Re: Multiple methods to create AVRO based Hive table

Rising Star

Yep, that should work just fine. Longer format is useful if you want to use custom SerDes such as the OpenCSV serde, but the table created using STORED AS AVRO will be identical as the one created explicitly using the org.apache.hadoop.hive.serde2.avro.AvroSerDe.

Re: Multiple methods to create AVRO based Hive table

New Contributor

Thank you so much for the explanations!

Don't have an account?
Coming from Hortonworks? Activate your account here