Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to monitor the actual memory allocation of a spark application

Explorer

Is there a proper way to monitor the memory usage of a spark application.

By memory usage, i didnt mean the executor memory, that can be set, but the actual memory usage of the application.

Note : We are running Spark on YARN

14 REPLIES 14

@Nikhil Have you checked the Executor tab in Spark UI, does this helps? RM UI also displays the total memory per application.

HTH

Explorer

@Felix Albani ... sorry for the delay in getting back

Spark UI - Checking the spark ui is not practical in our case.

RM UI - Yarn UI seems to display the total memory consumption of spark app that has executors and driver. From this how can we sort out the actual memory usage of executors.

I have ran a sample pi job. Could you please let me know how to get the actual memory consumption of executors

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 1 --driver-memory 512m --executor-memory 1024m --executor-cores 1 /usr/hdp/2.6.3.0-235/spark2/examples/jars/spark-examples*.jar 10

Application Id : application_1530502574422_0004

Application Stats

{"app":{"id":"application_1530502574422_0004","user":"ec2-user","name":"Spark Pi","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","trackingUrl":"http://dev-data-platform-hdp-master002-1c.data-eng.io:8088/proxy/application_1530502574422_0004/","diagnostics":"","clusterId":1530502574422,"applicationType":"SPARK","applicationTags":"","priority":0,"startedTime":1540731128244,"finishedTime":1540731139717,"elapsedTime":11473,"amContainerLogs":"http://dev-data-platform-hdp-worker001-1c.data-eng.io:8042/node/containerlogs/container_e28_1530502574422_0004_01_000001/ec2-user","amHostHttpAddress":"dev-data-platform-hdp-worker001-1c.data-eng.io:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":21027,"vcoreSeconds":15,"queueUsagePercentage":0.0,"clusterUsagePercentage":0.0,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"logAggregationStatus":"SUCCEEDED","unmanagedApplication":false,"amNodeLabelExpression":""}}

Executor details from Spark History Web UI

93459-hwx-226852.jpg

Explorer

@Felix Albani...did you had a chance to look into it

Hi @Nikhil

If you want to follow the memory usage of individual executors for spark, one way that is possible is via configuration of the spark metrics properties. I've previously posted the following guide that may help you set this up if this would fit your use case;

https://community.hortonworks.com/articles/222813/monitoring-spark-2-performance-via-grafana-in-amba...

I've just whipped up this example chart showing the individual driver & executor total memory usage for a simple spark application;

93083-screenshot-2018-11-01-at-082035.png

You can adjust above example according to your need, a total of executors or executors + driver combined, or keep them individual etc...

Explorer

@Jonathan Sneep...thanks for the input...will check this and let you know

Explorer

@Jonathan Sneep....is there a proper documention to install and configure graphite on amazon ami ?

@Nikhil
Not sure if there is any official documentation on that. I had a quick look and came across this on github, which looks good / straightforward to me.

Explorer

@Jonathan Sneep...i already followed that but getting the below error while installing packages

Error: Package: python-django-tagging-0.3.1-7.el6.noarch (epel)
           Requires: Django
 You could try using --skip-broken to work around the problem

Amazon Ami - Amazon Linux AMI 2017.03.1.20170812 x86_64 HVM GP2

Explorer

@Jonathan Sneep
I have managed to configure and integrate spark app, grafana and graphite.

Could you please let me know on how to configure the metrics to get the graphs of the below :

  • HDFS Bytes Read/Written Per Executor
  • HDFS Executor Read/Write Bytes/Sec
  • Read IOPS
  • Task Executor
    • Active Tasks per Executor
    • Completed Tasks per Executor
    • Completed Tasks/Minute per Executor
  • Driver Memory
    • Driver Heap Usage
    • Driver JVM Memory Pools Usage
    • Executor Memory Usage
    • JVM Heap Usage Per Executor
  • Spark Executor and Driver memory used

Explorer

@Jonathan Sneep

I am not able to reply to your comment. Seems like the Reply option is not available.

Hence replying here

Comment

HDFS Write bytes by executor should look something like this (be sure to set the left Y unit type to bytes);

aliasByNode($application.*.executor.filesystem.*.write_bytes, 1)

Executor and Driver memory usage example (similarly as above set the left Y unit to bytes);

aliasByNode($application.*.jvm.heap.used, 1)

I'll try to find time later to give you some more examples, but they are mostly slight variations on the examples above : - )

Thanks for the comment. Will try the metrics queries and will let you know.

Also looking forward to your updates on the below as well

  • HDFS Bytes Read/Written Per Executor
  • HDFS Executor Read Bytes/Sec
  • Read IOPS
  • Task Executor
    • Active Tasks per Executor
    • Completed Tasks per Executor
    • Completed Tasks/Minute per Executor
  • Driver Memory
    • Driver Heap Usage
    • Driver JVM Memory Pools Usage
    • Executor Memory Usage
    • JVM Heap Usage Per Executor

Explorer

@Jonathan Sneep
Could you please check if the below metrics queries are correct :

  • Driver Memory
    • Driver Heap Usage
      aliasByNode($application.driver.jvm.heap.usage, 1)
    • Driver JVM Memory Pools Usage
      aliasByNode($application.driver.jvm.pools.*.used, 4)
  • Executor & Driver Memory Used
    aliasByNode($application.*.jvm.heap.used, 1)
  • Executor Memory Used
    aliasByNode(exclude($application.*.jvm.heap.used, '.driver.jvm.heap'), 1)
    alias(sumSeries(exclude($application.*.jvm.heap.used, '.driver.jvm.heap')), 'total')
  • Task Executor
    • Active Tasks Per Executor
      aliasByNode(summarize($application.*.executor.threadpool.activeTasks, '10s', 'sum', false), 1)
    • Completed Tasks per Executor
      aliasByNode($application.*.executor.threadpool.completeTasks, 1)
    • Completed Tasks/Minute per Executor
      aliasByNode(nonNegativeDerivative(summarize($application.*.executor.threadpool.completeTasks, '1m', 'avg', false)), 1)
  • Read/Write IOPS
    • Read IOPS
      alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.read_ops)), 'total')
      aliasByNode(perSecond($application.*.executor.filesystem.hdfs.read_ops), 1)
    • Write IOPS
      alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.write_ops)), 'total')
      aliasByNode(perSecond($application.*.executor.filesystem.hdfs.write_ops), 1)
  • HDFS Bytes Reads/Writes Per Executor
    • Executor HDFS Reads
      aliasByMetric($application.*.executor.filesystem.hdfs.read_bytes)
    • Executor HDFS Bytes Written
      aliasByMetric($application.*.executor.filesystem.hdfs.write_bytes)

Also please let me know the queries for the below :

  • HDFS Read/Write Byte Rate
    • HDFS Read Rate/Sec
    • HDFS Write Rate/Sec

Looking forward to your update regarding the same.

Explorer

@Jonathan Sneep

Did you had a chance to look into it.

@Nikhil

Nice work. HDFS Write bytes by executor should look something like this (be sure to set the left Y unit type to bytes);

aliasByNode($application.*.executor.filesystem.*.write_bytes, 1)

Executor and Driver memory usage example (similarly as above set the left Y unit to bytes);

aliasByNode($application.*.jvm.heap.used, 1)

I'll try to find time later to give you some more examples, but they are mostly slight variations on the examples above : - )

Explorer

@Jonathan Sneep
Could you please check if the below metrics queries are correct :

  • Driver Memory
    • Driver Heap Usage
      aliasByNode($application.driver.jvm.heap.usage, 1)
    • Driver JVM Memory Pools Usage
      aliasByNode($application.driver.jvm.pools.*.used, 4)
  • Executor & Driver Memory Used
    aliasByNode($application.*.jvm.heap.used, 1)
  • Executor Memory Used
    aliasByNode(exclude($application.*.jvm.heap.used, '.driver.jvm.heap'), 1)
    alias(sumSeries(exclude($application.*.jvm.heap.used, '.driver.jvm.heap')), 'total')
  • Task Executor
    • Active Tasks Per Executor
      aliasByNode(summarize($application.*.executor.threadpool.activeTasks, '10s', 'sum', false), 1)
    • Completed Tasks per Executor
      aliasByNode($application.*.executor.threadpool.completeTasks, 1)
    • Completed Tasks/Minute per Executor
      aliasByNode(nonNegativeDerivative(summarize($application.*.executor.threadpool.completeTasks, '1m', 'avg', false)), 1)
  • Read/Write IOPS
    • Read IOPS
      alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.read_ops)), 'total')
      aliasByNode(perSecond($application.*.executor.filesystem.hdfs.read_ops), 1)
    • Write IOPS
      alias(perSecond(sumSeries($application.*.executor.filesystem.hdfs.write_ops)), 'total')
      aliasByNode(perSecond($application.*.executor.filesystem.hdfs.write_ops), 1)
  • HDFS Bytes Reads/Writes Per Executor
    • Executor HDFS Reads
      aliasByMetric($application.*.executor.filesystem.hdfs.read_bytes)
    • Executor HDFS Bytes Written
      aliasByMetric($application.*.executor.filesystem.hdfs.write_bytes)

Also please let me know the queries for the below :

  • HDFS Read/Write Byte Rate
    • HDFS Read Rate/Sec
    • HDFS Write Rate/Sec

Looking forward to your update regarding the same.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.