Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive table format and compression

avatar
Rising Star

Throwing this error while creating Hive parquet table with snappy compression in hive beeline mode.

Error: Error while compiling statement: FAILED: ParseException line 19:15 cannot recognize input near 'parquet' '.' 'compress' in table properties list (state=42000,code=40000)

CREATE EXTERNAL TABLE testsnappy ( column bigint )

row format delimited

fields terminated by ','

STORED as PARQUET

LOCATION 'path'

TBLPROPERTIES ("parquet.compress"="SNAPPY") " ;

Also is there a way to set compression format for already created tables ?

1 ACCEPTED SOLUTION

avatar
Rising Star

If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE.

CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';

Once the data is mapped, you can convert it to other formats like parquet:

set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;

For the hive optimized ORC format, the syntax is slightly different:

CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;

View solution in original post

5 REPLIES 5

avatar
Rising Star

If you create a Hive table over an existing data set in HDFS, you need to tell Hive about the format of the files as they are on the filesystem ("schema on read"). For text-based files, use the keywords STORED as TEXTFILE. Once you have declared your external table, you can convert the data into a columnar format like parquet or orc using CREATE TABLE.

CREATE EXTERNAL TABLE sourcetable (col bigint)
row format delimited
fields terminated by ","
STORED as TEXTFILE
LOCATION 'hdfs:///data/sourcetable';

Once the data is mapped, you can convert it to other formats like parquet:

set parquet.compression=SNAPPY; --this is the default actually
CREATE TABLE testsnappy_pq
STORED AS PARQUET
AS SELECT * FROM sourcetable;

For the hive optimized ORC format, the syntax is slightly different:

CREATE TABLE testsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;

avatar
Master Guru

Just a little comment. While in old versions of HDP for ORC files snappy provided performance benefits over zip this is not true anymore. Zip has three times better compression AND is as fast or faster now than snappy for most tables.

avatar
Rising Star

@Benjamin Leonhardi

All i need to do is on Hive external tables directly.

1.My above DDL statement was not working when i try to create parquet external table with snappy compression.

2. Is there a way to alter compression from snappy to ZIP in an existing hive external table.

avatar
Master Guru

2. Your only chance is a CTAS. I.e. create a new table "as" the old one compressed as zip then rename them. You can do that with external tables as well. However this is only true of new Hive versions and ORCs/Tez. For Parquet snappy may still be better.

avatar
Rising Star

The CREATE EXTERNAL TABLE statement must match the format on disk. If the files are in a self-describing format like parquet, you should not need to specify any table properties to read them (remove the TBLPROPERTIES line). If you want to convert to a new format, including a different compression algorithm, you will need to create a new table.