Created 08-10-2016 09:47 AM
Hi Community team,
Any one can you help me how to enable zlib compression in hdp.2.4.2.
Thanks in advance
Created 08-10-2016 02:11 PM
As @Artem Ervits shared, you get compression when storing your data in ORC format. However, if you want to store "raw" data on HDFS and you want to selectively compress it, you can use a simple PIG script to do it. Load the data from HDFS and then write it out again.
set output.compression.enabled true; set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec; inputFiles = LOAD '/input/directory/uncompressed' using PigStorage(); STORE inputFiles INTO '/output/directory/compressed/' USING PigStorage();
You can either leave the uncompressed data or remove it, depending on what you are doing. This is an approach that I've used.
You can use different codecs depending on your needs:
set output.compression.codec com.hadoop.compression.lzo.LzopCodec; set output.compression.codec org.apache.hadoop.io.compress.GzipCodec; set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;
Created 08-10-2016 12:12 PM
Take a look at this article, it has ways of setting compression, including zlib in Hive. http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
It will help if you specify which product specifically you're trying to enable zlib for. Since you categorized the question in data ingestion, I will assume it's for Sqoop, here's an example how to Sqoop using compression, just replace snappy codec class with zlib https://community.hortonworks.com/questions/29648/sqoop-import-to-hive-with-compression.html
Created 08-10-2016 01:59 PM
:Thank you for replying my question . I am looking for zlib compression in hdfs level.
Created 08-10-2016 02:11 PM
As @Artem Ervits shared, you get compression when storing your data in ORC format. However, if you want to store "raw" data on HDFS and you want to selectively compress it, you can use a simple PIG script to do it. Load the data from HDFS and then write it out again.
set output.compression.enabled true; set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec; inputFiles = LOAD '/input/directory/uncompressed' using PigStorage(); STORE inputFiles INTO '/output/directory/compressed/' USING PigStorage();
You can either leave the uncompressed data or remove it, depending on what you are doing. This is an approach that I've used.
You can use different codecs depending on your needs:
set output.compression.codec com.hadoop.compression.lzo.LzopCodec; set output.compression.codec org.apache.hadoop.io.compress.GzipCodec; set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;
Created 08-10-2016 03:05 PM
@Michael Young: could you please give me the syntax to set compression codec for zlib codec ??
Created 08-10-2016 03:38 PM
The default codec is zlib. If you want to explicitly set it to zlib, use the following:
set output.compression.codec org.apache.hadoop.io.compress.DefaultCodec;
Created 08-11-2016 02:06 PM
I just posted an article demonstrating a very simple Pig + Hive example showing HDFS compression.
Created 08-10-2016 02:13 PM
Here is the link for more information:
Created 08-10-2016 02:19 PM
Sample hive script:
CREATE EXTERNAL TABLE test.temp3
(
cat_0 bigint,
cat_1 bigint,
cat_2 bigint,
cat_3 bigint,
cat_4 bigint,
cat_5 bigint,
cat_6 bigint,
cat_7 bigint,
cat_8 bigint,
cat_9 bigint
)
row format delimited fields terminated by ','
stored as ORC location '/test/'
tblproperties ("orc.compress"="ZLIB");
Created 08-10-2016 03:07 PM
@Divakar Annapureddy: Thank you for replying my question. my case is a bit different. i need zlib codec for hdfs data(hadoop files)