Created 04-22-2016 06:14 AM
Throwing this error while creating Hive parquet table with snappy compression in hive beeline mode.
Error: Error while compiling statement: FAILED: ParseException line 19:15 cannot recognize input near 'parquet' '.' 'compress' in table properties list (state=42000,code=40000)
CREATE EXTERNAL TABLE testsnappy ( column bigint )
row format delimited
fields terminated by ','
STORED as PARQUET
LOCATION 'path'
TBLPROPERTIES ("parquet.compress"="SNAPPY") " ;
Also is there a way to set compression format for already created tables ?
Created 04-22-2016 07:58 AM
If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE.
CREATE EXTERNAL TABLE sourcetable (col bigint) row format delimited fields terminated by "," STORED as TEXTFILE LOCATION 'hdfs:///data/sourcetable';
Once the data is mapped, you can convert it to other formats like parquet:
set parquet.compression=SNAPPY; --this is the default actually CREATE TABLE testsnappy_pq STORED AS PARQUET AS SELECT * FROM sourcetable;
For the hive optimized ORC format, the syntax is slightly different:
CREATE TABLE testsnappy_orc STORED AS ORC TBLPROPERTIES("orc.compress"="snappy") AS SELECT * FROM sourcetable;
Created 04-22-2016 07:58 AM
If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE.
CREATE EXTERNAL TABLE sourcetable (col bigint) row format delimited fields terminated by "," STORED as TEXTFILE LOCATION 'hdfs:///data/sourcetable';
Once the data is mapped, you can convert it to other formats like parquet:
set parquet.compression=SNAPPY; --this is the default actually CREATE TABLE testsnappy_pq STORED AS PARQUET AS SELECT * FROM sourcetable;
For the hive optimized ORC format, the syntax is slightly different:
CREATE TABLE testsnappy_orc STORED AS ORC TBLPROPERTIES("orc.compress"="snappy") AS SELECT * FROM sourcetable;
Created 04-22-2016 08:17 AM
Just a little comment. While in old versions of HDP for ORC files snappy provided performance benefits over zip this is not true anymore. Zip has three times better compression AND is as fast or faster now than snappy for most tables.
Created 04-25-2016 09:26 AM
All i need to do is on Hive external tables directly.
1.My above DDL statement was not working when i try to create parquet external table with snappy compression.
2. Is there a way to alter compression from snappy to ZIP in an existing hive external table.
Created 04-25-2016 09:40 AM
2. Your only chance is a CTAS. I.e. create a new table "as" the old one compressed as zip then rename them. You can do that with external tables as well. However this is only true of new Hive versions and ORCs/Tez. For Parquet snappy may still be better.
Created 04-26-2016 10:56 PM
The CREATE EXTERNAL TABLE statement must match the format on disk. If the files are in a self-describing format like parquet, you should not need to specify any table properties to read them (remove the TBLPROPERTIES line). If you want to convert to a new format, including a different compression algorithm, you will need to create a new table.