Support Questions

mph · ‎06-27-2017

I want to do 'hdfs dfs -rm -R /ats/done' because there are some large files in there taking up a lot of space. Is this safe to do?

I also want to clear out logs in /app-logs/ , can I also delete these manually?

Thanks,

Mike

ssahi · ‎06-27-2017

Hi @MPH

You might be safer configuring the timeline properties in the YARN configs (and ensuring the ttl service is enabled), and letting YARN clean them up from the leveldb vs brute force deletion. You can reduce the time to live, and restart the ATS service to have it kick in and free up the space. See the following links in case they help further:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_yarn-resource-management/content/ref-ca0...

https://community.hortonworks.com/questions/46385/yarn-timeline-db-consuming-466gb-space.html

the /app-logs can also be managed by the yarn configs for time to retain, I think its:

yarn.nodemanager.log.retain-seconds

View solution in original post

ssahi · ‎06-27-2017

Hi @MPH

You might be safer configuring the timeline properties in the YARN configs (and ensuring the ttl service is enabled), and letting YARN clean them up from the leveldb vs brute force deletion. You can reduce the time to live, and restart the ATS service to have it kick in and free up the space. See the following links in case they help further:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_yarn-resource-management/content/ref-ca0...

https://community.hortonworks.com/questions/46385/yarn-timeline-db-consuming-466gb-space.html

the /app-logs can also be managed by the yarn configs for time to retain, I think its:

yarn.nodemanager.log.retain-seconds

mph · ‎06-27-2017

Hi - ive made these changes to the parameters and restarted yarn but the large files still remain in /ats/done.

mph · ‎06-28-2017

yarn.timeline-service.entity-group-fs-store.retain-seconds - ive tried reducing this (to 60 seconds) but it still doesnt seem to clear out the logs?

any ideas?

ssahi · ‎06-28-2017

Hi @MPH

Check the other related paramters in the documentation that I called above such as: yarn.timeline-service.entity-group-fs-store.scan-interval-seconds

And make sure to restart the yarn timelineserver after making the changes.

mph · ‎06-28-2017

On further investigation in the timeline server log file I saw that periodically there was a FileNotFound error when attempting to clean out the earliest application log directory that still contained data in ats/done :

2017-06-28 11:25:07,910 INFO  timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting hdfs://XXX:8020/ats/done/1494799829596/0000/000/application_1494799829596_0508
2017-06-28 11:25:07,924 ERROR timeline.EntityGroupFSTimelineStore (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files
java.io.FileNotFoundException: File hdfs:/XXX:8020/ats/done/1494799829596/0000/000/application_1494799829596_0508 does not exist.

It seems because this file was missing in the directory the process died, hence from this point the logs have been building up because it has been unable to clear them causing storage problems.

The question is why does the process not continue to the next logs if it cannot find a specific file to delete, or in fact why is it looking for a specific file when it should just purge whatever is there given the timestamp expiry?

Cloudera Community

Support Questions

can I delete files in /ats/done manually without causing problems?