Created on 12-10-201903:53 AM - edited on 12-22-202011:19 PM by VidyaSargur
This video explains feasible and efficient ways to troubleshoot performance or perform root-cause analysis on any Spark streaming application, which usually tend to grow over the gigabyte size. However, this article does not cover yarn-client mode as it is recommended to use yarn-cluster for streaming applications due to reasons that will not be discussed on this article.
Spark streaming applications usually run for long periods of time, before facing issues that may cause them to be shut down. In other cases, the application will not even be shut down, but it could be facing performance degradation during certain peak hours. In any case, the amount and size of this log will keep growing over time, making it really difficult to analyze when they start growing past the gigabyte size.
It's well known that Spark, as many other applications, uses log4j facility to handle logs for both the driver and the executors, hence it is recommended to tune the log4j.properties file, to leverage the rolling file appender option, which will basically create a log file, rotate it when a size limit is met, and keep a number of backup logs as historical information that we can later on use for analysis.
Updating the log4.properties file in the Spark configuration directory is not recommended, as it will have a cluster-wide effect, instead we can use it as a template to create our own log4j file that is going to be used for our streaming application without affecting other jobs.
As an example, in this video, a log4j.properties file is created from scratch to meet the following conditions:
Each log file will have a maximum size of 100Mb, a reasonable size that can be reviewed on most file editors while holding a reasonable time lapse of Spark events
The latest 10 files are backed up for for historical analysis.
The files will be saved in a custom path.
The log4.properties file can be reused for multiple Spark streaming applications, and log files for each application will not overwrite each other. The vm properties will be used as a workaround.
Both the Driver and the Executors, will have their own log4j properties file. This will provide flexibility on configuring log level for specific classes, file location, size, etc.
Make the current and previous logs available on the Resource Manager UI.
Create a new log4j-driver.properties file, for the Driver:
After running the Spark streaming application, the following information will be listed in NodeManager nodes where an executor is launched: This way it's easier to find and collect the necessary executor logs. Also, from the Resource Manager UI, the current log and any previous (backup) file will be listed: