Support Questions

Find answers, ask questions, and share your expertise

Spark Memory Monitoring : Need help on graphite queries to monitor spark applications

avatar
Contributor

I have configured graphite and grafana for monitoring the spark applications as per "https://community.hortonworks.com/articles/222813/monitoring-spark-2-performance-via-grafana-in-amba-1.html".

Are the below queries, the correct ones ?

  • Driver Memory
    • Driver Heap Usage
      aliasByNode($application.driver.jvm.heap.usage, 1)
    • Driver JVM Memory Pools Usage
      aliasByNode($application.driver.jvm.pools.*.used, 4)
  • Executor & Driver Memory Used
    aliasByNode($application.*.jvm.heap.used, 1)
  • Executor Memory Used
    aliasByNode(exclude($application.*.jvm.heap.used, '.driver.jvm.heap'), 1)
    alias(sumSeries(exclude($application.*.jvm.heap.used, '.driver.jvm.heap')), 'total')
  • Task Executor
    • Active Tasks Per Executor
      aliasByNode(summarize($application.*.executor.threadpool.activeTasks, '10s', 'sum', false), 1)
    • Completed Tasks per Executor
      aliasByNode($application.*.executor.threadpool.completeTasks, 1)
    • Completed Tasks/Minute per Executor
      aliasByNode(nonNegativeDerivative(summarize($application.*.executor.threadpool.completeTasks, '1m', 'avg', false)), 1)
  • Read/Write IOPS
    • Read IOPS
      alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.read_ops)), 'total')
      aliasByNode(perSecond($application.*.executor.filesystem.hdfs.read_ops), 1)
    • Write IOPS
      alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.write_ops)), 'total')
      aliasByNode(perSecond($application.*.executor.filesystem.hdfs.write_ops), 1)
  • HDFS Bytes Reads/Writes Per Executor
    • Executor HDFS Reads
      aliasByMetric($application.*.executor.filesystem.hdfs.read_bytes)
    • Executor HDFS Bytes Written
      aliasByMetric($application.*.executor.filesystem.hdfs.write_bytes

Also does grafana and graphite provides metrices on the below use case ?

  • We have a bunch of hourly / daily batches on Airflow. This batch use PySpark for data processing.
  • We want to see historical trend of Spark memory usage on the same batch.
  • So we want to aggregate Spark applications on the same batch then visualize historical trends so we can check if how memory usage is increased based on traffic
1 REPLY 1

avatar
Contributor

Can someone please have a look into it