Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is using data compression is better practice while storing data in hdfs?

avatar

Hi,

Is using data compression is better practice while storing data in hdfs?, I have found many different views regarding this. Can anyone explain whether it is good practice to use compression or it should be avoided?

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi Rushikesh,

Hadoop jobs are data intensive, compressing data can speed up the I/O operations

- MapReduce jobs are almost always I/O bound

Compressed data can save storage space and speed up data transfers across the network - Capital allocation for hardware can go further

Reduced I/O and network load can bring significant performance improvements

- MapReduce jobs can finish faster overall

On the other hand, CPU utilization and processing time increases during compression and decompression

- Understanding the tradeoff is important for MapReduce pipeline's overall performance

View solution in original post

3 REPLIES 3

avatar
Rising Star

Hi Rushikesh,

Hadoop jobs are data intensive, compressing data can speed up the I/O operations

- MapReduce jobs are almost always I/O bound

Compressed data can save storage space and speed up data transfers across the network - Capital allocation for hardware can go further

Reduced I/O and network load can bring significant performance improvements

- MapReduce jobs can finish faster overall

On the other hand, CPU utilization and processing time increases during compression and decompression

- Understanding the tradeoff is important for MapReduce pipeline's overall performance

avatar

@Karthik Gopal, thanks for sharing this link.