Created 01-24-2016 02:31 PM
Hi,
I'm asking myself how to have a good idea of the impact on the performance to use compression in HDFS?
This question is important because, if I implement the compression, should I considere increasing the CPU need of 10%, 20 %, 30% for the same performance?
I know that I can win on performance because less IOPS will be needed but what's about CPU?
I would like also ask, what will be the impact on HBase performance?
Many thanks in advance!
Created 01-24-2016 02:50 PM
@Michel Sumbul Very good question ..Please see this to start with http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2
It was 3 years ago. http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
Created 01-24-2016 02:50 PM
@Michel Sumbul Very good question ..Please see this to start with http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2
It was 3 years ago. http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
Created 01-24-2016 11:07 PM
@Michel Sumbul Based on the compression type...Yes
If I correctly understand the slides, I should expect a raise of the CPU usage between 5% and 60% depending of the compression algorythm. That can be really important!
HBASE - Link (Unofficial)
HBASE official guide - Production systems should use compression with their ColumnFamily definitions. See Appendix C, Compression In HBase for more information.
Created 01-24-2016 05:16 PM
@Michel Sumbul in terms of HBase, your mileage can vary, it depends on your workloads, in some use cases I've seen a lot better performance with compression on and in some not. There are also multiple levels of compression in HBase, (per column family, you can compress rowkeys only or both).
Created 01-24-2016 07:39 PM
Thanks guys for the fast reply!
@nsabharwal : If I correctly understand the slides, I should expect a raise of the CPU usage between 5% and 60% depending of the compression algorythm. That can be really important!
@aervits : do you have some benchmarks, test results to have a idea?
Many thanks guys!
Michel
Created 01-25-2016 01:41 PM
@Michel Sumbul I was able to reach 840k/sec reads on AWS, Centos7, XFS filesystem, 9 nodes, 12 7200 RPM drives in non-mapreduce mode. Same hardware write-only test resulted in 185k/sec. For mixed workload, I got 148k/sec writes and 270k/s reads. This is with snappy compression on.