Created 05-26-2016 10:45 AM
Hi,
We have enabled log aggregation in our hadoop cluster but still we see lot of files stored locally on the individual nodes (/u01/hadoop/yarn/local/usercache/hive/appcache) . I suppose these files should be moved to HDFS once the job is completed but not sure whether this is happening. Is there a way to troubleshoot this or is it safe to delete these files.
Regards,
Venkatesh S
Created 05-26-2016 12:38 PM
If yarn local directory is the the one that has space issue as indicated, then its not related to yarn container logs but yarn local data. Now, This can be valid case if the job is still running. If the job is not running, there will be cases when crashed jobs can leave yarn local data. If you want to clean this up, this can stop nodemanager on that node (when no containers are running on that node) and clean up all /yarn/local directories.
On another note, there is a warning about permissions on /app-logs. Please correct the file permission (though I believe this is not causing an issue right now)
Created 05-26-2016 11:26 AM
Can you check in logs if you see any error [Check in both Yarn and NM logs]
usually search for parameters -org.apache.hadoop.yarn.logaggregation
I see there are few bugs already with log aggregation which are fixed in HDP 2.2 and ahead-
https://issues.apache.org/jira/browse/YARN-2468
What is the version of HDP you are using ?
Also Can you make sure those property are in place and set correctly -
Created 05-26-2016 12:12 PM
@Sagar Shimpi Thanks a lot for your response.
Our Hadoop version is Hadoop 2.7.1.2.3.2.0-2950 and all the settings related to log configuration looks fine.
I have checked the yarn logs but found only the below warnings.
2016-05-25 15:39:55,813 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(195)) - Remote Root Log Dir [/app-logs] already exist, but with inc orrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users. 2016-05-25 15:39:55,813 WARN logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:<init>(190)) - rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disab led. The logs will be aggregated after this application is finished.
We see some output files in the appcache folder which is taking more space.
/u01/hadoop/yarn/local/usercache/hive/appcache
Created 05-26-2016 12:38 PM
If yarn local directory is the the one that has space issue as indicated, then its not related to yarn container logs but yarn local data. Now, This can be valid case if the job is still running. If the job is not running, there will be cases when crashed jobs can leave yarn local data. If you want to clean this up, this can stop nodemanager on that node (when no containers are running on that node) and clean up all /yarn/local directories.
On another note, there is a warning about permissions on /app-logs. Please correct the file permission (though I believe this is not causing an issue right now)
Created 05-26-2016 12:46 PM
@Ravi MutyalaThere is no job running currently. So I believe the files can be removed manually.
But does this happen with all failed jobs? Will it be a manual process every time to remove such kind of leftover files or any process is available to remove these files periodically?