Created 10-13-2021 06:15 AM
Hi experts,
I just wanted to confirm my understanding or help me better understand the yarn local and log dirs.
So my understanding is that yarn will download the data locally to a filesystem so that it is more easily accessible when a job is run as well as logs for that particular application. I believe these are temporary files as they will be stored in HDFS after the job completes.
[root@test01 log]# ll /hadoop/yarn/
total 0
drwxr-xr-x. 6 yarn hadoop 78 Oct 13 08:09 local
drwxrwxr-x. 8 yarn hadoop 239 Oct 13 08:09 log
Can someone please help confirm my understanding or help me better understand this concept?
Also what is usually best practice in regards to mounting these directories onto a local filesystem or onto another hard drive or can I have this directory share a hard drive with one of the datanode directories?
Any help is much appreciated.
Thanks,
Created 10-15-2021 04:17 PM
The localized log directory of an application will be found in
$ {yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers will have their log directories in directories named container_{$contid}.
Each container dir will contain stderr, stdin, and syslog generated by that particular container.
Created 10-15-2021 04:17 PM
The localized log directory of an application will be found in
$ {yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers will have their log directories in directories named container_{$contid}.
Each container dir will contain stderr, stdin, and syslog generated by that particular container.
Created 10-15-2021 05:59 PM
Thanks @Faizan_Ali for the explanation.
So in other words, once the job completes, then these logs will be stored in HDFS, or where are the logs stored after the application job is completed?
Are the local and log yarn dirs are only for temporary use, usually when a job runs in the hadoop cluster?
Thanks,
Created on 10-16-2021 03:04 AM - edited 10-16-2021 05:38 AM
This is the command that will list only the applications that are either in submitted, running or accepted state.
**TO ENABLE LOG AGGREGATION:**
Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
Created 10-16-2021 09:00 AM
@Faizan_Ali Thanks for the explanation.
Makes sense.
So while an application is running, it logs the container logs into a local directory "$ {yarn.nodemanager.log-dirs}/application_${appid}" then after the application is completed, it aggregates the logs into yarn.nodemanager.remote-app-log-dir.
Ok thanks for the explanation.