Created 09-06-2013 04:46 AM
Hi Guys,
I understand that PARQUET can handle nested structures and have a schema similar to avro. If so, what is the parquet equivalent of the HIVE avro DDL statement?
Basically I want to dynamically create new PARQUET tables based on received avro schemas.
Thanks
Andrew
Created 09-09-2013 05:18 PM
Hi Andrew -
You're correct that Parquet supports nested data types, it implements the record shredding and assembly algorithms from the Dremel paper.
You just need to add the Parquet jars and set the file format properties on the table (see below)
# Add JAR (version will depend)
hive> add jar parquet-<version>.jar
# Create your table
hive> CREATE TABLE my_table (col1 INT, col2 INT, col3 STRING)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
Parquet's Maven JARs
Created 09-16-2013 12:29 AM
Thanks Ricky.
Can I specify an Avro schema?
Created 10-17-2013 10:29 AM