Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to get compressed hdfs file using impala

how to get compressed hdfs file using impala

New Contributor

I am doing the self learning onthe impala and trying to enable the compression for the table but could not see the hdfs file getting the extension(.gz,.bzip2) creating in hadoop filesystem?

I am referring to but not sure how the final compressed file are creating. When I try sqoop, I can get the compress file. please guide.


I am expecting hdfs file .gz,.bzip should be created when we start inserting data into the managed table(if compressed enabled). is my assume correct or need to run some extra command for this file to compress at the end of inserting data?



create table csv_compressed (a string, b string, c string)
row format delimited fields terminated by ",";
insert into csv_compressed values
('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
...make equivalent .gz, .bz2, and .snappy files and load them into same table directory...

select * from csv_compressed;
a   b   c
one - snappy    two - snappy    three - snappy
one - uncompressed  two - uncompressed  three - uncompressed
abc - uncompressed  xyz - uncompressed  123 - uncompressed
one - bz2   two - bz2   three - bz2
abc - bz2   xyz - bz2   123 - bz2
one - gzip  two - gzip  three - gzip
abc - gzip  xyz - gzip  123 - gzip
$ hdfs dfs -ls 'hdfs://';
...truncated for readability...
75 hdfs://
79 hdfs://
80 hdfs://
116 hdfs://





Re: how to get compressed hdfs file using impala

Cloudera Employee

The example shows a combination of inserting uncompressed rows and manually placing compressed files in hdfs, then reading the contents of all these files with various levels of compression.
That type of insert statement (one row at-a-time) is typically just used for testing things out.

Depending on what type of compression you want, you have several different options for compressing the data as described in the link you've looked at.
Don't have an account?
Coming from Hortonworks? Activate your account here