We use Oozie to orchestrate the flow of our application. The workflow includes Spark, Java and Bash actions. We are want to aggregate all the logs to one location using the ELK stack. The problems are two-fold. 1) Getting the logs while the workflow is still running. 2) Low retention times for yarn logs after the application is in completed state, which leaves very small window to collect the logs before they are lost. Does anyone have any experience with this, and can you offer any suggestions? Also , please note we do not have admin rights on the cluster to change any configurations. In an ideal world we would want the Oozie, Yarn and application logs to be streamed as they occur. There are several solutions we're considering: Solution 1) 1) Using the Oozie REST api, poll for RUNNING workflows and get the ApplicationId of the Yarn jobs. 2) Using the Yarn REST api, poll for task attempts, and get the logs on the nodes via the api 3) Dump the logs somewhere 4)Use Filebeat to listen to logs path Solution 2) 1)Wait for Oozie workflow to complete 2)Using the Oozie REST api, get all ApplicationIds of the Yarn jobs submitted by the sub-workflows. 2)Collect aggregated logs with "yarn logs -applicationId <ApplicationId>" 3)Use Filebeat to listen to logs path Solution 3) 1)In all applications handled by oozie workflow write logs to tcp socket appender. *This would mean that the YARN nodes would be streaming the log events as they occur. However logs from YARN and Oozie would not be persisted this way, only the application logs would be saved, so we're still losing them unless there is a way to force YARN to use an additional log4j appender for specific jobs only. Thanks in advance for any help you can offer.
... View more