Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎04-10-2018

how to get compressed hdfs file using impala

I am doing the self learning onthe impala and trying to enable the compression for the table but could not see the hdfs file getting the extension(.gz,.bzip2) creating in hadoop filesystem?

I am referring to https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_txtfile.html but not sure how the final compressed file are creating. When I try sqoop, I can get the compress file. please guide.

 

I am expecting hdfs file .gz,.bzip should be created when we start inserting data into the managed table(if compressed enabled). is my assume correct or need to run some extra command for this file to compress at the end of inserting data?

 

 

create table csv_compressed (a string, b string, c string)
row format delimited fields terminated by ",";
insert into csv_compressed values
('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
...make equivalent .gz, .bz2, and .snappy files and load them into same table directory...

select * from csv_compressed;
----------------------------------------------------------
a   b   c
----------------------------------------------------------
one - snappy    two - snappy    three - snappy
one - uncompressed  two - uncompressed  three - uncompressed
abc - uncompressed  xyz - uncompressed  123 - uncompressed
one - bz2   two - bz2   three - bz2
abc - bz2   xyz - bz2   123 - bz2
one - gzip  two - gzip  three - gzip
abc - gzip  xyz - gzip  123 - gzip
----------------------------------------------------------
$ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/';
...truncated for readability...
75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed.snappy
79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_bz2.csv.bz2
80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_gzip.csv.gz
116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/dd414df64d67

 

 

 

Cloudera Employee
Posts: 6
Registered: ‎08-21-2017

Re: how to get compressed hdfs file using impala

Hi,

The example shows a combination of inserting uncompressed rows and manually placing compressed files in hdfs, then reading the contents of all these files with various levels of compression.
That type of insert statement (one row at-a-time) is typically just used for testing things out.

Depending on what type of compression you want, you have several different options for compressing the data as described in the link you've looked at.
Announcements