Support Questions

Find answers, ask questions, and share your expertise

Api to help pull yarn metrics and RM metrics

avatar

Hello All

I am trying to create a script that can pull all the resource manager / history server data via api for a period of 24 hours. I want the output in json format and then later i can parse and persist which can be used for trend analysis.

Any idea, how can I proceed on that. any pointers will be very helful.

thanks.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @akash sharma,

Requested functionality is implemented in smartsence Activity explorer, which is like a out of the box solution, please have a look at the functionality as that will have some pre-build reporting trend analysis reports will help to asses the capacity planing.

However, if you still wish to implement in-house solution you can get the data using the REST api from yarn and get the metrics out of the cluster.

following URL will have REST specs so that you can get the required information.

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

flowing code is scala snippet to extract the same in json format and eventually can be loaded to database or other location.

  def getURL (rmHost1: String , rmHost2: String , urlExtension : String 😞 URL = {

    val url1 = new URL(rmHost1 + urlExtension)
    val url2 = new URL(rmHost2 + urlExtension)

    try {
      url1.openConnection().asInstanceOf[HttpURLConnection].getResponseCode
      url1
    }
    catch {
      case e: Exception => {
        logger.info("Unable to Connect to primary RM Host Trying Secondary RM Host")
        try {
          url2.openConnection().asInstanceOf[HttpURLConnection].getResponseCode
          url2
        } catch {
          case f: Exception => {
            logger.info("Unable to connect to eaither of the RM Hostes hence terminating ..")
            logger.error("primary host Stack trace !!", e)
            logger.error("secondary host Stack trace !!", f)
            Throw f
          }
        }
      }
    }
  }

  def loadRM (props : Properties ,urlExtension : String 😞 String = {

    val url = getURL(props.getProperty("resoucemanagerHost1") , props.getProperty("resoucemanagerHost2"), urlExtension)
    if ( url == null ) { return null }
  try {
    val urlContext =  url.openConnection().asInstanceOf[HttpURLConnection]
    val resStr = IOUtils.toString(urlContext.getInputStream, StandardCharsets.UTF_8)
    urlContext.disconnect()
    resStr
  } catch {
    case e : Exception =>
      logger.error("Unable to load the URL : "+url.toString, e)
      Throw e
    }
  }




   val dat = appFunctions.loadRM(properties, "/ws/v1/cluster/appstatistics")
      if (dat == null) {
        logger.error("Could not get payload")
        return
      }
      val payload =
        try {
          new JSONObject(dat).getJSONObject("appStatInfo").getJSONArray("statItem")
        } catch {
          case e: Exception => {
            logger.error ("Unable to extract the content from Json for Cluster Metrics" + dat)
            return
          }
        }

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hi @akash sharma,

Requested functionality is implemented in smartsence Activity explorer, which is like a out of the box solution, please have a look at the functionality as that will have some pre-build reporting trend analysis reports will help to asses the capacity planing.

However, if you still wish to implement in-house solution you can get the data using the REST api from yarn and get the metrics out of the cluster.

following URL will have REST specs so that you can get the required information.

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

flowing code is scala snippet to extract the same in json format and eventually can be loaded to database or other location.

  def getURL (rmHost1: String , rmHost2: String , urlExtension : String 😞 URL = {

    val url1 = new URL(rmHost1 + urlExtension)
    val url2 = new URL(rmHost2 + urlExtension)

    try {
      url1.openConnection().asInstanceOf[HttpURLConnection].getResponseCode
      url1
    }
    catch {
      case e: Exception => {
        logger.info("Unable to Connect to primary RM Host Trying Secondary RM Host")
        try {
          url2.openConnection().asInstanceOf[HttpURLConnection].getResponseCode
          url2
        } catch {
          case f: Exception => {
            logger.info("Unable to connect to eaither of the RM Hostes hence terminating ..")
            logger.error("primary host Stack trace !!", e)
            logger.error("secondary host Stack trace !!", f)
            Throw f
          }
        }
      }
    }
  }

  def loadRM (props : Properties ,urlExtension : String 😞 String = {

    val url = getURL(props.getProperty("resoucemanagerHost1") , props.getProperty("resoucemanagerHost2"), urlExtension)
    if ( url == null ) { return null }
  try {
    val urlContext =  url.openConnection().asInstanceOf[HttpURLConnection]
    val resStr = IOUtils.toString(urlContext.getInputStream, StandardCharsets.UTF_8)
    urlContext.disconnect()
    resStr
  } catch {
    case e : Exception =>
      logger.error("Unable to load the URL : "+url.toString, e)
      Throw e
    }
  }




   val dat = appFunctions.loadRM(properties, "/ws/v1/cluster/appstatistics")
      if (dat == null) {
        logger.error("Could not get payload")
        return
      }
      val payload =
        try {
          new JSONObject(dat).getJSONObject("appStatInfo").getJSONArray("statItem")
        } catch {
          case e: Exception => {
            logger.error ("Unable to extract the content from Json for Cluster Metrics" + dat)
            return
          }
        }