Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What determines the value of "HDFS: Number of Read Operations" counter in a M/R job?

What determines the value of "HDFS: Number of Read Operations" counter in a M/R job?

New Contributor

I have a M/R (map-only job) job that is run against the same local input files in 2 different HDP clusters. Both the clusters have the exact same config.

In cluster1, HDFS: Number of Read Operations = number of mappers * 9

In cluster2, HDFS: Number of Read Operations = number of mappers * 10.

The job run on cluster 1 is ~30% faster than that on cluster 2.

The above multiplication factors remain the same even if the number of Mappers are increased / decreased. I have pretty much checked all other configs and couldn't find any difference in configs / files.

I am curious to know what determines the above multiplication factor (9 on cluster1 & 10 on cluster2). Already spent some time to get the performance equal on both the clusters. Any input is appreciated. Thanks.