Hello everyone,
I am Emmanuel Katto currently working on evaluating the disk I/O of our CDH (Cloudera Distribution for Hadoop) cluster, which consists of several hundred bare metal machines. I would like to obtain the following values for each application within a certain period of time:
- total_io_mb
- mapreduce_inputBytes
- mapreduce_outputBytes
These values, I believe, are logged in the YARN logs, but I’m not sure how to configure YARN or the logging system to ensure these values are written in the log files.
So far, through Cloudera Manager, we’ve only been able to get metrics like the yarn_application_hdfs_bytes_read_rate, but that’s not enough for evaluating overall disk I/O.
Could anyone share any advice or alternatives on how to extract these specific I/O values for each application? Also, if there’s a way to configure YARN or Cloudera Manager to write these metrics into the logs, I’d appreciate your insights.
Thanks in advance!
Regards
Emmanuel Katto