- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
LZMA compression codec support
- Labels:
-
Apache Hadoop
Created on ‎01-09-2016 05:09 PM - edited ‎09-16-2022 02:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi experts!
seems that LZMA algorithm could be pretty siutable for some Hadoop cases (like storing historical inmutable data). Does someone know is it possible to implement it somehow or reuse some library?
any ideas are very welcome!
thanks!
Created ‎01-11-2016 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just build that project and use the produced jar with the suggested config
change (add "io.sensesecure.hadoop.xz.XZCodec" to io.compression.codecs).
Created ‎01-09-2016 09:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://issues.apache.org/jira/browse/HADOOP-6837?focusedCommentId=13687660&page=com.atlassian.jira....
It looks like you can try https://github.com/yongtang/hadoop-xz (although
it seems like pure-java instead of native-extended, but not necessarily a
bad thing given LZMA's true goals).
Created ‎01-11-2016 07:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply!
seems promissing, but as far as i understand it require to rebuld your Hadoop distribution pack.
what if i just have CDH pack and want to plug this like extention (for example like lzo does through the parcels)...
thanks!
Created ‎01-11-2016 09:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just build that project and use the produced jar with the suggested config
change (add "io.sensesecure.hadoop.xz.XZCodec" to io.compression.codecs).
Created ‎01-12-2016 02:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
many thanks!
Created ‎01-13-2016 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so, i've started to play with this and met interesting thing. When I try to proceed data with lzma i read in two times more data then i'm actually have on the HDFS.
For example, hadoop client (hadoop fs -du) shows some numbers like 100GB.
then i run MR (like select count(1) ) over this data and check MR counters and find "HDFS bytes read" two times more (like 200GB).
In case of gzip and bzip2 codecs hadoop client file size and MR counters are the similar
