- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Log aggregation for Long running Spark Streaming jobs
- Labels:
-
Apache Spark
-
Apache YARN
Created on ‎02-21-2017 01:49 PM - edited ‎09-16-2022 04:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The documentation for YARN log aggregation says that logs are aggregated after an application completes.
Does this rule out the applicability of YARN log aggregation for Spark streaming jobs because in theory streaming jobs run for a much longer duration and potentially don't ever terminate. I want to get the Spark Streaming jobs into HDFS before the job completes; Since Streaming jobs runs forever. Is there a good way to get Spark log data into HDFS?
Suri
Created ‎02-21-2017 02:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You achieve it by setting appropriate value: in yarn-site.xml
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
Then yarn will aggreagate the logs for the running jobs too.
https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Suri
Created ‎02-21-2017 02:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You achieve it by setting appropriate value: in yarn-site.xml
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
Then yarn will aggreagate the logs for the running jobs too.
https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Suri
Created ‎02-07-2019 07:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are running Spark Streaming job on cluster managed by CM 6. After the spark streaming job has been run for like 4-5 days, the Spark UI for that particular job does not open. It says, logs like this in my nohup driver output file.
servlet.ServletHandler: Error for /streaming/
java.lang.OutOfMemoryError: Java heap space
These logs are logged many times in a continuous series.
But my job keeps on running fine. Its just that I am not able to open up the UI by clicking the Application Master link when I open the job from YARN Running Applications UI.
