I have configured graphite and grafana for monitoring the spark applications as per "https://community.hortonworks.com/articles/222813/monitoring-spark-2-performance-via-grafana-in-amba-1.html".
Are the below queries, the correct ones ?
- Driver Memory
- Driver Heap Usage
aliasByNode($application.driver.jvm.heap.usage, 1) - Driver JVM Memory Pools Usage
aliasByNode($application.driver.jvm.pools.*.used, 4)
- Executor & Driver Memory Used
aliasByNode($application.*.jvm.heap.used, 1) - Executor Memory Used
aliasByNode(exclude($application.*.jvm.heap.used, '.driver.jvm.heap'), 1)
alias(sumSeries(exclude($application.*.jvm.heap.used, '.driver.jvm.heap')), 'total') - Task Executor
- Active Tasks Per Executor
aliasByNode(summarize($application.*.executor.threadpool.activeTasks, '10s', 'sum', false), 1) - Completed Tasks per Executor
aliasByNode($application.*.executor.threadpool.completeTasks, 1) - Completed Tasks/Minute per Executor
aliasByNode(nonNegativeDerivative(summarize($application.*.executor.threadpool.completeTasks, '1m', 'avg', false)), 1)
- Read/Write IOPS
- Read IOPS
alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.read_ops)), 'total')
aliasByNode(perSecond($application.*.executor.filesystem.hdfs.read_ops), 1) - Write IOPS
alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.write_ops)), 'total')
aliasByNode(perSecond($application.*.executor.filesystem.hdfs.write_ops), 1)
- HDFS Bytes Reads/Writes Per Executor
- Executor HDFS Reads
aliasByMetric($application.*.executor.filesystem.hdfs.read_bytes) - Executor HDFS Bytes Written
Also does grafana and graphite provides metrices on the below use case ?
- We have a bunch of hourly / daily batches on Airflow. This batch use PySpark for data processing.
- We want to see historical trend of Spark memory usage on the same batch.
- So we want to aggregate Spark applications on the same batch then visualize historical trends so we can check if how memory usage is increased based on traffic