Support Questions
Find answers, ask questions, and share your expertise

Log managmement for Long-running Spark Streaming Jobs on YARN Cluster

Contributor

Hi, We need to find a way to maintain and search logs for the Long running Sprk streaming jobs on YARN. We have Log aggregation disabled in our cluster. We are thinking about Solr/Elastic search and may be Flume or Kafka to read the Sprk job logs.

any suggestions on how to implement search the on these logs and easily manage them?

Thanks,

Suri

1 ACCEPTED SOLUTION

Accepted Solutions

Contributor

You achieve it by setting appropriate value: in yarn-site.xml

yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds

Then yarn will aggreagate the logs for the running jobs too.

https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Suri

View solution in original post

1 REPLY 1

Contributor

You achieve it by setting appropriate value: in yarn-site.xml

yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds

Then yarn will aggreagate the logs for the running jobs too.

https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Suri

View solution in original post