Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Rising Star

For certain large environments, it's very easy to for Spark History Server to get overwhelmed by the large number of applications being posted and number of users / developers viewing history data.

Spark jobs create an artifact called the history file which is what is parsed by the Spark History Server (SHS) and served via the UI. The size of this file has a huge impact in driving the load on the SHS also note that the size of history file is determined by the number of events generated by the SHS (small executor heart beat interval)

Workaround:

If you are still interested in analyzing performance issues with these large history files, one option is to download these files and browse them from a locally hosted SHS instance. To run this:

  1. Download Spark 1.6 https://spark.apache.org/downloads.html
  2. Unpack
  3. Create a directory to hold the logs called spark-logs
  4. Create a properties file called test.properties
  5. Inside test.properties add spark.history.fs.logDirectory=<path to the spark-logs directory>
  6. <spark download>/sbin/start-history-server.sh --properties-file <path to test.properties>
  7. Open web browser and visit http://localhost:18080

Once done, you can now download Spark History files from HDFS and copy them to this directory. The running Spark History Server will dynamically load the files as they are made available in spark-logs directory.

1,478 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎01-02-2018 09:27 PM
Updated by:
 
Contributors
Top Kudoed Authors