Created on 05-17-2016 01:06 PM - edited 09-16-2022 03:20 AM
Team,
Actually there are multiple jobs running on our servers and during running jobs are creating more staging data in local /var/log/yarn/log dir. I understand it is because of container and yarn.nodemanager.log-dirs property.
We have 100GB for this location but still it is getting full, so is there anyway where we can redirect it to hdfs ?
Thanks in advance.
Created 05-17-2016 01:20 PM
How about enabling the yarn log aggregation? Once your job get completed it will automatically move the job logs from local to HDFS centralize location.
http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
Created 05-17-2016 01:38 PM
Thanks @Jitendra Yadav.
I have done it but after that also it it getting 100% used.
Created 05-17-2016 01:48 PM
ohh so you mean you have 100GB delicately on each NM node for yarn log and YARN log agg is also enabled and still you are facing this issue with local log dir? I think we need to check why that location is not getting cleared after job completion, may be something else occupying the space?
Created 05-17-2016 03:17 PM
@Jitendra Yadav: yes We have enabled 100GB for all worker nodes. And data is getting cleaned once job compete or failed. But my query is many users are running a very big query and their jobs are consuming whole 100 GB even more than that. Because of that jobs are failing.
Created 05-17-2016 03:40 PM
Ok, So if I roughly calculate the the size vs no. of jobs i.e if each job generate 100MB of logs on each node then it means you can have up to 1000 jobs running at the same time. Is that the case?.
1. Either some other log occupied the space in same partition.
2. Or the yarn job logs are not getting cleaned up fast.
3. Or you have a big cluster where you are running hundreds of jobs with some extra debugging. If this is the case then you need to reorganized the logging confs and consider increasing/adding space in yarn.nodemanager.log-dirs partition.
Can you share the disk usage of parent directories from that 100GB partition? @Saurabh Kumar
Created 05-17-2016 07:30 PM
Can you please share the below parameters values?
yarn.nodemanager.local-dirs
hadoop.tmp.dir
Created 05-18-2016 11:15 AM
@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.
Created 05-17-2016 05:25 PM
@Jitendra Yadav: Let me explain my issue little bit more. We have total 52 worker nodes and each node has 100GB dedicated for /var/log. And there use to be a very big hive query(with 20 or more left join or right joins) which users run and during a single query run it create metadata(~100 GB) with many containers. This is the cause of issue and it trigger alerts. Once this job will fail or complete then immediately logs will clean.
Created 05-18-2016 06:06 AM
@Jitendra Yadav: We have following value for the above required properties.
yarn.nodemanager.local-dirs=/grid01/hadoop/yarn/log,/grid03/hadoop/yarn/log,/grid04/hadoop/yarn/log,/grid05/hadoop/yarn/log,/grid06/hadoop/yarn/log,/grid07/hadoop/yarn/log,/grid08/hadoop/yarn/log,/grid09/hadoop/yarn/log,/grid10/hadoop/yarn/log
And I could not find any value for hadoop.tmp.dir.