Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can we change yarn.nodemanager.log-dirs value from local to hdfs ?

avatar
Guru

Team,

Actually there are multiple jobs running on our servers and during running jobs are creating more staging data in local /var/log/yarn/log dir. I understand it is because of container and yarn.nodemanager.log-dirs property.

We have 100GB for this location but still it is getting full, so is there anyway where we can redirect it to hdfs ?

Thanks in advance.

16 REPLIES 16

avatar
Super Guru
@Saurabh Kumar

How about enabling the yarn log aggregation? Once your job get completed it will automatically move the job logs from local to HDFS centralize location.

http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

avatar
Guru

Thanks @Jitendra Yadav.

I have done it but after that also it it getting 100% used.

avatar
Super Guru

@Saurabh Kumar

ohh so you mean you have 100GB delicately on each NM node for yarn log and YARN log agg is also enabled and still you are facing this issue with local log dir? I think we need to check why that location is not getting cleared after job completion, may be something else occupying the space?

avatar
Guru

@Jitendra Yadav: yes We have enabled 100GB for all worker nodes. And data is getting cleaned once job compete or failed. But my query is many users are running a very big query and their jobs are consuming whole 100 GB even more than that. Because of that jobs are failing.

avatar
Super Guru

Ok, So if I roughly calculate the the size vs no. of jobs i.e if each job generate 100MB of logs on each node then it means you can have up to 1000 jobs running at the same time. Is that the case?.

1. Either some other log occupied the space in same partition.

2. Or the yarn job logs are not getting cleaned up fast.

3. Or you have a big cluster where you are running hundreds of jobs with some extra debugging. If this is the case then you need to reorganized the logging confs and consider increasing/adding space in yarn.nodemanager.log-dirs partition.

Can you share the disk usage of parent directories from that 100GB partition? @Saurabh Kumar

avatar
Super Guru
@Saurabh Kumar

Can you please share the below parameters values?

yarn.nodemanager.local-dirs

hadoop.tmp.dir

avatar
Super Guru

@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.

avatar
Guru

@Jitendra Yadav: Let me explain my issue little bit more. We have total 52 worker nodes and each node has 100GB dedicated for /var/log. And there use to be a very big hive query(with 20 or more left join or right joins) which users run and during a single query run it create metadata(~100 GB) with many containers. This is the cause of issue and it trigger alerts. Once this job will fail or complete then immediately logs will clean.

avatar
Guru

@Jitendra Yadav: We have following value for the above required properties.

yarn.nodemanager.local-dirs=/grid01/hadoop/yarn/log,/grid03/hadoop/yarn/log,/grid04/hadoop/yarn/log,/grid05/hadoop/yarn/log,/grid06/hadoop/yarn/log,/grid07/hadoop/yarn/log,/grid08/hadoop/yarn/log,/grid09/hadoop/yarn/log,/grid10/hadoop/yarn/log

And I could not find any value for hadoop.tmp.dir.