Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Understanding the File System Counter, File: number of bytes read

Understanding the File System Counter, File: number of bytes read

New Contributor

For instance, at the end of a mapreduce job the following is reported:

 

File System Counters
FILE: Number of bytes read=42924972694

 

Versus this number....

 

HDFS: Number of bytes read=272906990810

 

My assumption is that you want to miminize file system reads. If so, what are the tuning parements, configuration items, etc. that contribute to this number or is it all in the mapreduce job itself? I'm trying to reconcile performance differences across varoius clusters.

1 REPLY 1

Re: Understanding the File System Counter, File: number of bytes read

Cloudera Employee

So the answer is really that what you are noticing is job specific. Depending on the job the mappers/reducers will write more or less bytes to local file compared to the hdfs.

In your mapper case, you have a similar amount of data that was read in from both local and HDFS locations, there is no problem there. Your Mapper code just happens to need to read about the same amount of data locally as it reads from HDFS. Most of the time the Mappers are being used to analyze an amount of data greater than it's RAM, so it's not surprising to see it possibly writing the data it gets from the HDFS to a local drive. The number of bytes read from HDFS and local are not always going to look like they sum up to the local write size (which they don't even in your case).

Don't have an account?
Coming from Hortonworks? Activate your account here