- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is using data compression is better practice while storing data in hdfs?
- Labels:
-
Apache Hadoop
Created ‎02-18-2016 09:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is using data compression is better practice while storing data in hdfs?, I have found many different views regarding this. Can anyone explain whether it is good practice to use compression or it should be avoided?
Created ‎02-18-2016 09:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rushikesh,
Hadoop jobs are data intensive, compressing data can speed up the I/O operations
- MapReduce jobs are almost always I/O bound
Compressed data can save storage space and speed up data transfers across the network - Capital allocation for hardware can go further
Reduced I/O and network load can bring significant performance improvements
- MapReduce jobs can finish faster overall
On the other hand, CPU utilization and processing time increases during compression and decompression
- Understanding the tradeoff is important for MapReduce pipeline's overall performance
Created ‎02-18-2016 09:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rushikesh,
Hadoop jobs are data intensive, compressing data can speed up the I/O operations
- MapReduce jobs are almost always I/O bound
Compressed data can save storage space and speed up data transfers across the network - Capital allocation for hardware can go further
Reduced I/O and network load can bring significant performance improvements
- MapReduce jobs can finish faster overall
On the other hand, CPU utilization and processing time increases during compression and decompression
- Understanding the tradeoff is important for MapReduce pipeline's overall performance
Created ‎02-18-2016 09:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎02-20-2016 01:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Karthik Gopal, thanks for sharing this link.
