Looks like logs are not completely aggregated to the application and there may be truncation of logs too .we can see from RM that for the application there is timeout for some of the nodemanagers . Job runs for 55 minutes and we have set yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds to 3600. Any Pointers.