Support Questions

Find answers, ask questions, and share your expertise

Manage YARN local log-dirs space

avatar
Explorer

Hi,

We have enabled log aggregation in our hadoop cluster but still we see lot of files stored locally on the individual nodes (/u01/hadoop/yarn/local/usercache/hive/appcache) . I suppose these files should be moved to HDFS once the job is completed but not sure whether this is happening. Is there a way to troubleshoot this or is it safe to delete these files.

Regards,

Venkatesh S

1 ACCEPTED SOLUTION

avatar
Guru

@Venkadesh Sivalingam

If yarn local directory is the the one that has space issue as indicated, then its not related to yarn container logs but yarn local data. Now, This can be valid case if the job is still running. If the job is not running, there will be cases when crashed jobs can leave yarn local data. If you want to clean this up, this can stop nodemanager on that node (when no containers are running on that node) and clean up all /yarn/local directories.

On another note, there is a warning about permissions on /app-logs. Please correct the file permission (though I believe this is not causing an issue right now)

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Venkadesh Sivalingam

Can you check in logs if you see any error [Check in both Yarn and NM logs]

usually search for parameters -org.apache.hadoop.yarn.logaggregation

I see there are few bugs already with log aggregation which are fixed in HDP 2.2 and ahead-

BUG-12006

https://issues.apache.org/jira/browse/YARN-2468

What is the version of HDP you are using ?

Also Can you make sure those property are in place and set correctly -

PROPERTIES RESPECTED WHEN LOG-AGGREGATION IS ENABLED

  • yarn.nodemanager.remote-app-log-dir: This is on the default file-system, usually HDFS and indictes where the NMs should aggregate logs to. This should not be local file-system, otherwise serving daemons like history-server will not able to serve the aggregated logs. Default is /tmp/logs.
  • yarn.nodemanager.remote-app-log-dir-suffix: The remote log dir will be created at {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}. Default value is “logs””.
  • yarn.log-aggregation.retain-seconds: How long to wait before deleting aggregated-logs, -1 or a negative number disables the deletion of aggregated-logs. One needs to be careful and not set this to a too small a value so as to not burden the distributed file-system.
  • yarn.log-aggregation.retain-check-interval-seconds: Determines how long to wait between aggregated-log retention-checks. If it is set to 0 or a negative value, then the value is computed as one-tenth of the aggregated-log retention-time. As with the previous configuration property, one needs to be careful and not set this to low values. Defaults to -1.
  • yarn.log.server.url: Once an application is done, NMs redirect web UI users to this URL where aggregated-logs are served. Today it points to the MapReduce specific JobHistory.

avatar
Explorer

@Sagar Shimpi Thanks a lot for your response.

Our Hadoop version is Hadoop 2.7.1.2.3.2.0-2950 and all the settings related to log configuration looks fine.

I have checked the yarn logs but found only the below warnings.

2016-05-25 15:39:55,813 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root Log Dir [/app-logs] already exist, but with inc orrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users. 2016-05-25 15:39:55,813 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(190)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disab led. The logs will be aggregated after this application is finished.

We see some output files in the appcache folder which is taking more space.

/u01/hadoop/yarn/local/usercache/hive/appcache

avatar
Guru

@Venkadesh Sivalingam

If yarn local directory is the the one that has space issue as indicated, then its not related to yarn container logs but yarn local data. Now, This can be valid case if the job is still running. If the job is not running, there will be cases when crashed jobs can leave yarn local data. If you want to clean this up, this can stop nodemanager on that node (when no containers are running on that node) and clean up all /yarn/local directories.

On another note, there is a warning about permissions on /app-logs. Please correct the file permission (though I believe this is not causing an issue right now)

avatar
Explorer

@Ravi MutyalaThere is no job running currently. So I believe the files can be removed manually.

But does this happen with all failed jobs? Will it be a manual process every time to remove such kind of leftover files or any process is available to remove these files periodically?