Support Questions

SK1 · ‎05-17-2016

Team,

Actually there are multiple jobs running on our servers and during running jobs are creating more staging data in local /var/log/yarn/log dir. I understand it is because of container and yarn.nodemanager.log-dirs property.

We have 100GB for this location but still it is getting full, so is there anyway where we can redirect it to hdfs ?

Thanks in advance.

jyadav · ‎05-17-2016

@Saurabh Kumar

How about enabling the yarn log aggregation? Once your job get completed it will automatically move the job logs from local to HDFS centralize location.

http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

SK1 · ‎05-17-2016

Thanks @Jitendra Yadav.

I have done it but after that also it it getting 100% used.

jyadav · ‎05-17-2016

@Saurabh Kumar

ohh so you mean you have 100GB delicately on each NM node for yarn log and YARN log agg is also enabled and still you are facing this issue with local log dir? I think we need to check why that location is not getting cleared after job completion, may be something else occupying the space?

SK1 · ‎05-17-2016

@Jitendra Yadav: yes We have enabled 100GB for all worker nodes. And data is getting cleaned once job compete or failed. But my query is many users are running a very big query and their jobs are consuming whole 100 GB even more than that. Because of that jobs are failing.

jyadav · ‎05-17-2016

Ok, So if I roughly calculate the the size vs no. of jobs i.e if each job generate 100MB of logs on each node then it means you can have up to 1000 jobs running at the same time. Is that the case?.

1. Either some other log occupied the space in same partition.

2. Or the yarn job logs are not getting cleaned up fast.

3. Or you have a big cluster where you are running hundreds of jobs with some extra debugging. If this is the case then you need to reorganized the logging confs and consider increasing/adding space in yarn.nodemanager.log-dirs partition.

Can you share the disk usage of parent directories from that 100GB partition? @Saurabh Kumar

jyadav · ‎05-17-2016

@Saurabh Kumar

Can you please share the below parameters values?

yarn.nodemanager.local-dirs

hadoop.tmp.dir

jyadav · ‎05-18-2016

@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.

SK1 · ‎05-17-2016

@Jitendra Yadav: Let me explain my issue little bit more. We have total 52 worker nodes and each node has 100GB dedicated for /var/log. And there use to be a very big hive query(with 20 or more left join or right joins) which users run and during a single query run it create metadata(~100 GB) with many containers. This is the cause of issue and it trigger alerts. Once this job will fail or complete then immediately logs will clean.

SK1 · ‎05-18-2016

@Jitendra Yadav: We have following value for the above required properties.

yarn.nodemanager.local-dirs=/grid01/hadoop/yarn/log,/grid03/hadoop/yarn/log,/grid04/hadoop/yarn/log,/grid05/hadoop/yarn/log,/grid06/hadoop/yarn/log,/grid07/hadoop/yarn/log,/grid08/hadoop/yarn/log,/grid09/hadoop/yarn/log,/grid10/hadoop/yarn/log

And I could not find any value for hadoop.tmp.dir.

Cloudera Community

Support Questions

Can we change yarn.nodemanager.log-dirs value from local to hdfs ?