question Re: Hive table format and compression in Support Questions

Hive table format and compression

Aswanth11 — Fri, 22 Apr 2016 13:14:35 GMT

Throwing this error while creating Hive parquet table with snappy compression in hive beeline mode.

Error: Error while compiling statement: FAILED: ParseException line 19:15 cannot recognize input near 'parquet' '.' 'compress' in table properties list (state=42000,code=40000)

CREATE EXTERNAL TABLE testsnappy ( column bigint )

row format delimited

fields terminated by ','

STORED as PARQUET

LOCATION 'path'

TBLPROPERTIES ("parquet.compress"="SNAPPY") " ;

Also is there a way to set compression format for already created tables ?

Re: Hive table format and compression

jpp — Fri, 22 Apr 2016 14:58:44 GMT

If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE.

CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';

Once the data is mapped, you can convert it to other formats like parquet:

set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;

For the hive optimized ORC format, the syntax is slightly different:

CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;

Re: Hive table format and compression

bleonhardi — Fri, 22 Apr 2016 15:17:10 GMT

Just a little comment. While in old versions of HDP for ORC files snappy provided performance benefits over zip this is not true anymore. Zip has three times better compression AND is as fast or faster now than snappy for most tables.

Re: Hive table format and compression

Aswanth11 — Mon, 25 Apr 2016 16:26:28 GMT

@Benjamin Leonhardi

All i need to do is on Hive external tables directly.

1.My above DDL statement was not working when i try to create parquet external table with snappy compression.

2. Is there a way to alter compression from snappy to ZIP in an existing hive external table.

Re: Hive table format and compression

bleonhardi — Mon, 25 Apr 2016 16:40:39 GMT

2. Your only chance is a CTAS. I.e. create a new table "as" the old one compressed as zip then rename them. You can do that with external tables as well. However this is only true of new Hive versions and ORCs/Tez. For Parquet snappy may still be better.

Re: Hive table format and compression

jpp — Wed, 27 Apr 2016 05:56:12 GMT

The CREATE EXTERNAL TABLE statement must match the format on disk. If the files are in a self-describing format like parquet, you should not need to specify any table properties to read them (remove the TBLPROPERTIES line). If you want to convert to a new format, including a different compression algorithm, you will need to create a new table.