Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to compress the hdfs data using zlib compression ??

Solved Go to solution
Highlighted

how to compress the hdfs data using zlib compression ??

Super Collaborator

Hi Community team,

Any one can you help me how to enable zlib compression in hdp.2.4.2.

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: how to compress the hdfs data using zlib compression ??

@subhash parise

As @Artem Ervits shared, you get compression when storing your data in ORC format. However, if you want to store "raw" data on HDFS and you want to selectively compress it, you can use a simple PIG script to do it. Load the data from HDFS and then write it out again.

set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;

inputFiles = LOAD '/input/directory/uncompressed' using PigStorage();
STORE inputFiles INTO '/output/directory/compressed/' USING PigStorage();

You can either leave the uncompressed data or remove it, depending on what you are doing. This is an approach that I've used.

You can use different codecs depending on your needs:

set output.compression.codec com.hadoop.compression.lzo.LzopCodec;
set output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;

View solution in original post

9 REPLIES 9
Highlighted

Re: how to compress the hdfs data using zlib compression ??

Mentor

Take a look at this article, it has ways of setting compression, including zlib in Hive. http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/

It will help if you specify which product specifically you're trying to enable zlib for. Since you categorized the question in data ingestion, I will assume it's for Sqoop, here's an example how to Sqoop using compression, just replace snappy codec class with zlib https://community.hortonworks.com/questions/29648/sqoop-import-to-hive-with-compression.html

Highlighted

Re: how to compress the hdfs data using zlib compression ??

Super Collaborator
@Artem Ervits

:Thank you for replying my question . I am looking for zlib compression in hdfs level.

Highlighted

Re: how to compress the hdfs data using zlib compression ??

@subhash parise

As @Artem Ervits shared, you get compression when storing your data in ORC format. However, if you want to store "raw" data on HDFS and you want to selectively compress it, you can use a simple PIG script to do it. Load the data from HDFS and then write it out again.

set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;

inputFiles = LOAD '/input/directory/uncompressed' using PigStorage();
STORE inputFiles INTO '/output/directory/compressed/' USING PigStorage();

You can either leave the uncompressed data or remove it, depending on what you are doing. This is an approach that I've used.

You can use different codecs depending on your needs:

set output.compression.codec com.hadoop.compression.lzo.LzopCodec;
set output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;

View solution in original post

Highlighted

Re: how to compress the hdfs data using zlib compression ??

Super Collaborator

@Michael Young: could you please give me the syntax to set compression codec for zlib codec ??

Highlighted

Re: how to compress the hdfs data using zlib compression ??

@subhash parise

The default codec is zlib. If you want to explicitly set it to zlib, use the following:

set output.compression.codec org.apache.hadoop.io.compress.DefaultCodec;
Highlighted

Re: how to compress the hdfs data using zlib compression ??

@subhash parise

I just posted an article demonstrating a very simple Pig + Hive example showing HDFS compression.

https://community.hortonworks.com/content/kbentry/50921/using-pig-to-convert-uncompressed-data-to-co...

Highlighted

Re: how to compress the hdfs data using zlib compression ??

Highlighted

Re: how to compress the hdfs data using zlib compression ??

Sample hive script:

CREATE EXTERNAL TABLE test.temp3

(

cat_0 bigint,

cat_1 bigint,

cat_2 bigint,

cat_3 bigint,

cat_4 bigint,

cat_5 bigint,

cat_6 bigint,

cat_7 bigint,

cat_8 bigint,

cat_9 bigint

)

row format delimited fields terminated by ','

stored as ORC location '/test/'

tblproperties ("orc.compress"="ZLIB");

Highlighted

Re: how to compress the hdfs data using zlib compression ??

Super Collaborator

@Divakar Annapureddy: Thank you for replying my question. my case is a bit different. i need zlib codec for hdfs data(hadoop files)

Don't have an account?
Coming from Hortonworks? Activate your account here