Looks like logs are not completely aggregated to the application and there may be truncation of logs too .we can see from RM that for the application there is timeout for some of the nodemanagers . Job runs for 55 minutes and we have set yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds to 3600. Any Pointers.
@andyk As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks
Diana Torres, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button. Learn more about the Cloudera Community: