Created 06-27-2017 06:52 PM
I want to do 'hdfs dfs -rm -R /ats/done' because there are some large files in there taking up a lot of space. Is this safe to do?
I also want to clear out logs in /app-logs/ , can I also delete these manually?
Thanks,
Mike
Created 06-27-2017 07:34 PM
Hi @MPH
You might be safer configuring the timeline properties in the YARN configs (and ensuring the ttl service is enabled), and letting YARN clean them up from the leveldb vs brute force deletion. You can reduce the time to live, and restart the ATS service to have it kick in and free up the space. See the following links in case they help further:
https://community.hortonworks.com/questions/46385/yarn-timeline-db-consuming-466gb-space.html
the /app-logs can also be managed by the yarn configs for time to retain, I think its:
yarn.nodemanager.log.retain-seconds
Created 06-27-2017 07:34 PM
Hi @MPH
You might be safer configuring the timeline properties in the YARN configs (and ensuring the ttl service is enabled), and letting YARN clean them up from the leveldb vs brute force deletion. You can reduce the time to live, and restart the ATS service to have it kick in and free up the space. See the following links in case they help further:
https://community.hortonworks.com/questions/46385/yarn-timeline-db-consuming-466gb-space.html
the /app-logs can also be managed by the yarn configs for time to retain, I think its:
yarn.nodemanager.log.retain-seconds
Created 06-27-2017 08:53 PM
Hi - ive made these changes to the parameters and restarted yarn but the large files still remain in /ats/done.
Created 06-28-2017 09:26 AM
yarn.timeline-service.entity-group-fs-store.retain-seconds - ive tried reducing this (to 60 seconds) but it still doesnt seem to clear out the logs?
any ideas?
Created 06-28-2017 12:48 PM
Hi @MPH
Check the other related paramters in the documentation that I called above such as: yarn.timeline-service.entity-group-fs-store.scan-interval-seconds
And make sure to restart the yarn timelineserver after making the changes.
Created 06-28-2017 11:05 AM
On further investigation in the timeline server log file I saw that periodically there was a FileNotFound error when attempting to clean out the earliest application log directory that still contained data in ats/done :
2017-06-28 11:25:07,910 INFO timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting hdfs://XXX:8020/ats/done/1494799829596/0000/000/application_1494799829596_0508 2017-06-28 11:25:07,924 ERROR timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files java.io.FileNotFoundException: File hdfs:/XXX:8020/ats/done/1494799829596/0000/000/application_1494799829596_0508 does not exist.
It seems because this file was missing in the directory the process died, hence from this point the logs have been building up because it has been unable to clear them causing storage problems.
The question is why does the process not continue to the next logs if it cannot find a specific file to delete, or in fact why is it looking for a specific file when it should just purge whatever is there given the timestamp expiry?