Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Log managmement for Long-running Spark Streaming Jobs on YARN Cluster

avatar
Rising Star

Hi, We need to find a way to maintain and search logs for the Long running Sprk streaming jobs on YARN. We have Log aggregation disabled in our cluster. We are thinking about Solr/Elastic search and may be Flume or Kafka to read the Sprk job logs.

any suggestions on how to implement search the on these logs and easily manage them?

Thanks,

Suri

1 ACCEPTED SOLUTION

avatar
Rising Star

You achieve it by setting appropriate value: in yarn-site.xml

yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds

Then yarn will aggreagate the logs for the running jobs too.

https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Suri

View solution in original post

1 REPLY 1

avatar
Rising Star

You achieve it by setting appropriate value: in yarn-site.xml

yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds

Then yarn will aggreagate the logs for the running jobs too.

https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Suri