question Log managmement for Long-running Spark Streaming Jobs on YARN Cluster in Archives of Support Questions (Read Only)

Log managmement for Long-running Spark Streaming Jobs on YARN Cluster

SuriNuthalapati — Wed, 22 Feb 2017 00:35:15 GMT

Hi, We need to find a way to maintain and search logs for the Long running Sprk streaming jobs on YARN. We have Log aggregation disabled in our cluster. We are thinking about Solr/Elastic search and may be Flume or Kafka to read the Sprk job logs.

any suggestions on how to implement search the on these logs and easily manage them?

Thanks,

Suri

Re: Log managmement for Long-running Spark Streaming Jobs on YARN Cluster

SuriNuthalapati — Thu, 23 Feb 2017 01:37:53 GMT

You achieve it by setting appropriate value: in yarn-site.xml

yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds

Then yarn will aggreagate the logs for the running jobs too.

https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Suri