Support Questions

Find answers, ask questions, and share your expertise

how to compress existed table in Hive

avatar

Hi,

My hive table is having 200 TB of data and it's not compressed.

Could you please help me how to compress this table without dropping.

Thanks,

Ram

7 REPLIES 7

avatar

avatar
Master Guru

Really the only way is his CTAS if you want to change compression format or add compression. .

avatar

could you please provide the sample code.

avatar
Master Guru

As simple as this:

CREATE TABLE t1_orc STORED AS ORC AS SELECT * FROM <your-existing-table>;

Note that if you have a single 200T table, this is going to take a while. You can test on a smaller table first.

avatar
Master Guru

If your table is partitioned you have to create it first as "STORED AS ORC" and then do " INSERT INTO" it listing all fields in SELECT. Also enable dynamic partitions.

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
create table if not exists t1 (a int, b int) partitioned by (c int);       -- your original table
create table t1orc (a int, b int) partitioned by (c int) stored as ORC;    -- your compressed table
insert into table t1orc partition(c) select a, b, c from t1;

avatar

Hi,

Thanks for your prompt response.

My table is partitioned table if I create new table with CTAS will it be also partitioned?

Thanks

avatar
Rising Star

CTAS has these restrictions:

  • The target table cannot be a partitioned table.
  • The target table cannot be an external table.
  • The target table cannot be a list bucketing table.