Support Questions

Find answers, ask questions, and share your expertise

HDFS Compression vs Performance

avatar
Rising Star

Hi,

I'm asking myself how to have a good idea of the impact on the performance to use compression in HDFS?

This question is important because, if I implement the compression, should I considere increasing the CPU need of 10%, 20 %, 30% for the same performance?

I know that I can win on performance because less IOPS will be needed but what's about CPU?

I would like also ask, what will be the impact on HBase performance?

Many thanks in advance!

1 ACCEPTED SOLUTION

avatar
Master Mentor
5 REPLIES 5

avatar
Master Mentor

avatar
Master Mentor

@Michel Sumbul Based on the compression type...Yes

If I correctly understand the slides, I should expect a raise of the CPU usage between 5% and 60% depending of the compression algorythm. That can be really important!

HBASE - Link (Unofficial)

HBASE official guide - Production systems should use compression with their ColumnFamily definitions. See Appendix C, Compression In HBase for more information.

avatar
Master Mentor

@Michel Sumbul in terms of HBase, your mileage can vary, it depends on your workloads, in some use cases I've seen a lot better performance with compression on and in some not. There are also multiple levels of compression in HBase, (per column family, you can compress rowkeys only or both).

avatar
Rising Star

Thanks guys for the fast reply!

@nsabharwal : If I correctly understand the slides, I should expect a raise of the CPU usage between 5% and 60% depending of the compression algorythm. That can be really important!

@aervits : do you have some benchmarks, test results to have a idea?

Many thanks guys!

Michel

avatar
Master Mentor

@Michel Sumbul I was able to reach 840k/sec reads on AWS, Centos7, XFS filesystem, 9 nodes, 12 7200 RPM drives in non-mapreduce mode. Same hardware write-only test resulted in 185k/sec. For mixed workload, I got 148k/sec writes and 270k/s reads. This is with snappy compression on.