Support Questions

ryu · ‎10-13-2021

Hi experts,

I just wanted to confirm my understanding or help me better understand the yarn local and log dirs.

So my understanding is that yarn will download the data locally to a filesystem so that it is more easily accessible when a job is run as well as logs for that particular application. I believe these are temporary files as they will be stored in HDFS after the job completes.

[root@test01 log]# ll /hadoop/yarn/
total 0
drwxr-xr-x. 6 yarn hadoop 78 Oct 13 08:09 local
drwxrwxr-x. 8 yarn hadoop 239 Oct 13 08:09 log

Can someone please help confirm my understanding or help me better understand this concept?

Also what is usually best practice in regards to mounting these directories onto a local filesystem or onto another hard drive or can I have this directory share a hard drive with one of the datanode directories?

Any help is much appreciated.

Thanks,

Faizan_Ali · ‎10-15-2021

The localized log directory of an application will be found in

$ {yarn.nodemanager.log-dirs}/application_${appid}.

Individual containers will have their log directories in directories named container_{$contid}.

Each container dir will contain stderr, stdin, and syslog generated by that particular container.

View solution in original post

Faizan_Ali · ‎10-15-2021

The localized log directory of an application will be found in

$ {yarn.nodemanager.log-dirs}/application_${appid}.

Individual containers will have their log directories in directories named container_{$contid}.

Each container dir will contain stderr, stdin, and syslog generated by that particular container.

ryu · ‎10-15-2021

Thanks @Faizan_Ali for the explanation.

So in other words, once the job completes, then these logs will be stored in HDFS, or where are the logs stored after the application job is completed?

Are the local and log yarn dirs are only for temporary use, usually when a job runs in the hadoop cluster?

Thanks,

Faizan_Ali · ‎10-16-2021

yarn application -list

This is the command that will list only the applications that are either in submitted, running or accepted state.

There is a log aggregation that collects each container's logs and moves these logs onto the directory configured in yarn.nodemanager.remote-app-log-dir only after the completion of the application. So the applicationId listed by the command isn't completed yet and the logs are not yet collected and this is why you cant see the logs.

**TO ENABLE LOG AGGREGATION:**

Log aggregation is enabled in the yarn-site.xml file. The yarn.log-aggregation-enable property enables log aggregation for running applications.

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

Use this command yarn logs -applicationId <application ID> to view the logs of the application once it gets completed.
To list all finished applications use yarn application -list -appstates FINISHED
To list all the applications yarn application -list -appstates ALL

ryu · ‎10-16-2021

@Faizan_Ali Thanks for the explanation.

Makes sense.
So while an application is running, it logs the container logs into a local directory "$ {yarn.nodemanager.log-dirs}/application_${appid}" then after the application is completed, it aggregates the logs into yarn.nodemanager.remote-app-log-dir.

Ok thanks for the explanation.

Cloudera Community

Support Questions

Can someone explain what the yarn local and log dirs do?