- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Can someone explain what the yarn local and log dirs do?
- Labels:
-
Apache Hadoop
-
Apache YARN
Created ‎10-13-2021 06:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi experts,
I just wanted to confirm my understanding or help me better understand the yarn local and log dirs.
So my understanding is that yarn will download the data locally to a filesystem so that it is more easily accessible when a job is run as well as logs for that particular application. I believe these are temporary files as they will be stored in HDFS after the job completes.
[root@test01 log]# ll /hadoop/yarn/
total 0
drwxr-xr-x. 6 yarn hadoop 78 Oct 13 08:09 local
drwxrwxr-x. 8 yarn hadoop 239 Oct 13 08:09 log
Can someone please help confirm my understanding or help me better understand this concept?
Also what is usually best practice in regards to mounting these directories onto a local filesystem or onto another hard drive or can I have this directory share a hard drive with one of the datanode directories?
Any help is much appreciated.
Thanks,
Created ‎10-15-2021 04:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The localized log directory of an application will be found in
$ {yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers will have their log directories in directories named container_{$contid}.
Each container dir will contain stderr, stdin, and syslog generated by that particular container.
Created ‎10-15-2021 04:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The localized log directory of an application will be found in
$ {yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers will have their log directories in directories named container_{$contid}.
Each container dir will contain stderr, stdin, and syslog generated by that particular container.
Created ‎10-15-2021 05:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Faizan_Ali for the explanation.
So in other words, once the job completes, then these logs will be stored in HDFS, or where are the logs stored after the application job is completed?
Are the local and log yarn dirs are only for temporary use, usually when a job runs in the hadoop cluster?
Thanks,
Created on ‎10-16-2021 03:04 AM - edited ‎10-16-2021 05:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- yarn application -list
This is the command that will list only the applications that are either in submitted, running or accepted state.
- There is a log aggregation that collects each container's logs and moves these logs onto the directory configured in yarn.nodemanager.remote-app-log-dir only after the completion of the application. So the applicationId listed by the command isn't completed yet and the logs are not yet collected and this is why you cant see the logs.
**TO ENABLE LOG AGGREGATION:**
Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
- Use this command yarn logs -applicationId <application ID> to view the logs of the application once it gets completed.
- To list all finished applications use yarn application -list -appstates FINISHED
- To list all the applications yarn application -list -appstates ALL
Created ‎10-16-2021 09:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Faizan_Ali Thanks for the explanation.
Makes sense.
So while an application is running, it logs the container logs into a local directory "$ {yarn.nodemanager.log-dirs}/application_${appid}" then after the application is completed, it aggregates the logs into yarn.nodemanager.remote-app-log-dir.
Ok thanks for the explanation.
