Reply
New Contributor
Posts: 2
Registered: ‎08-24-2018

Collect yarn logs from remote machine

Hi,

We have to implement a solution to download yarn application logs from a remote cluster machine, the credentials to which are not exposed to us. Is there a way to download logs from that cluster for a given application, considering we cannot install yarn client on our current machine, hence we cannot use "yarn logs -applicationId <appId>" command.

 

Also I noticed the logs being stored in hdfs under remote app logs dir, but they seem to be of some other format(have some weird characters in between words like ^@^D). Will fetching those files be the correct way to get the application logs. If there is a different way please let me know. Thanks in advance.

Highlighted
Master
Posts: 426
Registered: ‎07-01-2015

Re: Collect yarn logs from remote machine

Probably it is possible to write a script in python or in bash via curl which will access the JobHistory server URL given the fact you know the application_id, you will collect all the NodeManagers and Container IDs plus you collect the Application Manager container's log.

 

https://<jobhistory>:19890/jobhistory/logs/<nodemanagerN>:8041/<Container ID>/<Attempt ID>/hive/syslog?start=0

But if you are using Kerberos, you have probably secured those endpoints, so you have to authenticate.

New Contributor
Posts: 2
Registered: ‎08-24-2018

Re: Collect yarn logs from remote machine

Great!! This was our preferred approach. The only problem was that there was no way, without timeline server, that we can obtain nodeId and containerId for a given application ID which is either killed/failed/finished. Resourcemanager stores information only for running applications. Since timeline server is still not supported completely by cloudera, and our exisiting customers would not want to install an extra service on demand just for this purpose, we were looking for alternative ways to obtain this info.
New Contributor
Posts: 2
Registered: ‎01-29-2019

Re: Collect yarn logs from remote machine

Could you please explain how to fetch containerIds and node Info which is required by history server without timeline server.