Is there any documentation or resources that walkthrough the installation and configuration of ELK stack with Spark + YARN Application Logs?
I am looking to collect a subset of YARN Application Logs and copy these out of HDFS. Then I would have Logstash configured on a server where these logs would be transferred and pulled into ElasticSearch. ElasticSearch would not be on the cluster, as we will also be collecting logs from control-m and other third party applications.
On our cluster, logs are stored in /tmp/logs/<user_svc_account> which would contain all executor logs as separate files. I'm looking specifically for stdout and stderr logs but these seem to be embedded in the executor logs. There is a lot of noise and garbage in these logs that make them difficult to pull into ElasticSearch.
I have been trying to find as much information as I can about setting this up but there's not a lot of information out there. Appreciate any info anyone can provide.