Hi, We need to find a way to maintain and search logs for the Long running Sprk streaming jobs on YARN. We have Log aggregation disabled in our cluster. We are thinking about Solr/Elastic search and may be Flume or Kafka to read the Sprk job logs.
any suggestions on how to implement search the on these logs and easily manage them?
We eant to searh for key phrases and at the same time we want developers to look in to the raw logs too for their troubleshooting and alerts for specific errors.
The documentation for YARN log aggregation says that logs are aggregated after an application completes.
Streaming jobs run for a much longer duration and potentially don't ever terminate. I want to get the logs into HDFS for my streaming jobs before the application completes or terminates. What are the better ways to do it, since Log aggregation only do it after the jobs are completed.
So, if I set yarn.log-aggregation.retain-check-interval-seconds to 60 Seconds, It will send the logs to HDFS (every 60 seconds) even when the job was not finished? (Since streaming jobs run forever)