We noticed that when a yarn job fails and the aggregation option is enabled, we can’t find the containers’ logs (about this failed job), in the usual folder into HDFS (/app-logs/…). We can see all logs about the jobs finished with success, but nothing about those failed (we have a Log Aggregation Retention set to 7 days and we have had this problem 2 days ago...).
We’re wondering if maybe could be a bug in the aggregation process, but we would like have further information about this issue from Hortonworks Support/someone from the community, in order to confirm that or have another explanation…
The ownership and permissions are correct in the /app-logs/... default folder, because we can see all logs about the jobs finished with success.
Any tips about this problem?
Many thanks in advance for the kind cooperation.
it is possible that the job had failed before log could be created. Navigate to Resource Manager UI and look for the failed job. Once you click on your application ID, it will show the status of log aggregation and possible failure reason. you can also open AM log from here.
Hi @Pranay Vyas,
If I navigate to Resource Manager UI or AM log, I can't see any logs, because the logs shown in those sections are taken from the container's logs on the server where the nodemanager (which managed this job) is active through the aggregator service... So, if for any reason, that container's logs for this failed job was not created, is normal that I can't see nothing in the sections mentioned earlier...
At this point, the main question is: Why a failed jobs no log anything into his container and not even in the resource manager logs on the master server, this failed job logs anything about the root cause of the problem??? Maybe we have a "bug" in this flow?
Many thanks for your time and availability.