Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE Parquet tables

Highlighted

HIVE Parquet tables

Explorer

Hi Guys,

 

I understand that PARQUET can handle nested structures and have a schema similar to avro. If so, what is the parquet equivalent of the HIVE avro DDL statement?

 

Basically I want to dynamically create new PARQUET tables based on received avro schemas.

 

Thanks

 

Andrew

3 REPLIES 3

Re: HIVE Parquet tables

New Contributor

Hi Andrew -

 

You're correct that Parquet supports nested data types, it implements the record shredding and assembly algorithms from the Dremel paper. 

 

You just need to add the Parquet jars and set the file format properties on the table (see below)

 

# Add JAR (version will depend)
hive> add jar parquet-<version>.jar

# Create your table
hive> CREATE TABLE my_table (col1 INT, col2 INT, col3 STRING)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';

 Parquet's Maven JARs

 

http://search.maven.org/#search%7Cga%7C1%7Cparquet

Re: HIVE Parquet tables

Explorer

Thanks Ricky.

 

Can I specify an Avro schema?

Re: HIVE Parquet tables

Master Guru
Presently the Parquet+Hive SerDe only supports Writable based serialization, not Avro: https://github.com/Parquet/parquet-mr/blob/master/parquet-hive/src/main/java/parquet/hive/serde/Parq...
Don't have an account?
Coming from Hortonworks? Activate your account here