Created 02-18-2016 09:02 AM
Hi,
Is using data compression is better practice while storing data in hdfs?, I have found many different views regarding this. Can anyone explain whether it is good practice to use compression or it should be avoided?
Created 02-18-2016 09:12 AM
Hi Rushikesh,
Hadoop jobs are data intensive, compressing data can speed up the I/O operations
- MapReduce jobs are almost always I/O bound
Compressed data can save storage space and speed up data transfers across the network - Capital allocation for hardware can go further
Reduced I/O and network load can bring significant performance improvements
- MapReduce jobs can finish faster overall
On the other hand, CPU utilization and processing time increases during compression and decompression
- Understanding the tradeoff is important for MapReduce pipeline's overall performance
Created 02-18-2016 09:12 AM
Hi Rushikesh,
Hadoop jobs are data intensive, compressing data can speed up the I/O operations
- MapReduce jobs are almost always I/O bound
Compressed data can save storage space and speed up data transfers across the network - Capital allocation for hardware can go further
Reduced I/O and network load can bring significant performance improvements
- MapReduce jobs can finish faster overall
On the other hand, CPU utilization and processing time increases during compression and decompression
- Understanding the tradeoff is important for MapReduce pipeline's overall performance
Created 02-18-2016 09:12 AM
Created 02-20-2016 01:39 PM
@Karthik Gopal, thanks for sharing this link.