<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to Obtain total_io_mb, mapreduce_inputBytes, and mapreduce_outputBytes for Each Application in Yarn Logs? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/397367#M249825</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/120718"&gt;@emmanuelkatto24&lt;/a&gt;,&amp;nbsp;Welcome to our community! To help you get the best possible answer, I have tagged our Airflow experts &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/40384"&gt;@smdas&lt;/a&gt;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
    <pubDate>Tue, 12 Nov 2024 09:19:10 GMT</pubDate>
    <dc:creator>VidyaSargur</dc:creator>
    <dc:date>2024-11-12T09:19:10Z</dc:date>
    <item>
      <title>How to Obtain total_io_mb, mapreduce_inputBytes, and mapreduce_outputBytes for Each Application in Yarn Logs?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/397297#M249802</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;
&lt;P&gt;I am Emmanuel Katto currently working on evaluating the disk I/O of our CDH (Cloudera Distribution for Hadoop) cluster, which consists of several hundred bare metal machines. I would like to obtain the following values for each application within a certain period of time:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;total_io_mb&lt;/LI&gt;
&lt;LI&gt;mapreduce_inputBytes&lt;/LI&gt;
&lt;LI&gt;mapreduce_outputBytes&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These values, I believe, are logged in the YARN logs, but I’m not sure how to configure YARN or the logging system to ensure these values are written in the log files.&lt;/P&gt;
&lt;P&gt;So far, through Cloudera Manager, we’ve only been able to get metrics like the yarn_application_hdfs_bytes_read_rate, but that’s not enough for evaluating overall disk I/O.&lt;/P&gt;
&lt;P&gt;Could anyone share any advice or alternatives on how to extract these specific I/O values for each application? Also, if there’s a way to configure YARN or Cloudera Manager to write these metrics into the logs, I’d appreciate your insights.&lt;/P&gt;
&lt;P&gt;Thanks in advance!&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Emmanuel Katto&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 05:44:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/397297#M249802</guid>
      <dc:creator>emmanuelkatto24</dc:creator>
      <dc:date>2024-12-19T05:44:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to Obtain total_io_mb, mapreduce_inputBytes, and mapreduce_outputBytes for Each Application in Yarn Logs?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/397367#M249825</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/120718"&gt;@emmanuelkatto24&lt;/a&gt;,&amp;nbsp;Welcome to our community! To help you get the best possible answer, I have tagged our Airflow experts &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/40384"&gt;@smdas&lt;/a&gt;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Nov 2024 09:19:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/397367#M249825</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2024-11-12T09:19:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to Obtain total_io_mb, mapreduce_inputBytes, and mapreduce_outputBytes for Each Application in Yarn Logs?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/403670#M252154</link>
      <description>&lt;P&gt;Assuming it's a MapReduce job, since you're looking for information related to MapReduce I/O counters.&lt;BR /&gt;&lt;BR /&gt;Script to calculate the counter info.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[hive@node4 ~]$ cat get_io_counters.sh
#!/bin/bash

# Ensure a job ID is provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 &amp;lt;job_id&amp;gt;"
    exit 1
fi

JOB_ID=$1

# Extract I/O counters from the MapReduce job status
mapred job -status "$JOB_ID" | egrep -A 1 'File Input Format Counters|File Output Format Counters' | awk -F'=' '
  /File Input Format Counters/ {getline; bytes_read=$2}
  /File Output Format Counters/ {getline; bytes_written=$2}
  END {
    total_io_mb = (bytes_read + bytes_written) / (1024 * 1024)
    printf "BYTES_READ=%d\nBYTES_WRITTEN=%d\nTOTAL_IO_MB=%.2f\n", bytes_read, bytes_written, total_io_mb
  }'

[hive@node4 ~]$&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;Sample Output&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[hive@node4 ~]$ ./get_io_counters.sh job_1741272271547_0007
25/03/06 15:38:34 INFO client.RMProxy: Connecting to ResourceManager at node3.playground-ggangadharan.coelab.cloudera.com/10.129.117.75:8032
25/03/06 15:38:35 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
BYTES_READ=288894
BYTES_WRITTEN=348894
TOTAL_IO_MB=0.61
[hive@node4 ~]$&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 06 Mar 2025 15:42:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-Obtain-total-io-mb-mapreduce-inputBytes-and-mapreduce/m-p/403670#M252154</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2025-03-06T15:42:46Z</dc:date>
    </item>
  </channel>
</rss>

