- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
how to compress existed table in Hive
- Labels:
-
Apache Hive
Created 03-16-2017 04:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My hive table is having 200 TB of data and it's not compressed.
Could you please help me how to compress this table without dropping.
Thanks,
Ram
Created 03-16-2017 05:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This post has a pretty good explanation:
https://community.hortonworks.com/questions/28856/hive-table-format-and-compression.html
Created 03-16-2017 05:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Really the only way is his CTAS if you want to change compression format or add compression. .
Created 03-16-2017 05:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
could you please provide the sample code.
Created 03-16-2017 05:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As simple as this:
CREATE TABLE t1_orc STORED AS ORC AS SELECT * FROM <your-existing-table>;
Note that if you have a single 200T table, this is going to take a while. You can test on a smaller table first.
Created 03-16-2017 07:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your table is partitioned you have to create it first as "STORED AS ORC" and then do " INSERT INTO" it listing all fields in SELECT. Also enable dynamic partitions.
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; create table if not exists t1 (a int, b int) partitioned by (c int); -- your original table create table t1orc (a int, b int) partitioned by (c int) stored as ORC; -- your compressed table insert into table t1orc partition(c) select a, b, c from t1;
Created 03-16-2017 05:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for your prompt response.
My table is partitioned table if I create new table with CTAS will it be also partitioned?
Thanks
Created 03-16-2017 06:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CTAS has these restrictions:
- The target table cannot be a partitioned table.
- The target table cannot be an external table.
- The target table cannot be a list bucketing table.