Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Collect yarn logs from remote machine

Collect yarn logs from remote machine

New Contributor

Hi,

We have to implement a solution to download yarn application logs from a remote cluster machine, the credentials to which are not exposed to us. Is there a way to download logs from that cluster for a given application, considering we cannot install yarn client on our current machine, hence we cannot use "yarn logs -applicationId <appId>" command.

 

Also I noticed the logs being stored in hdfs under remote app logs dir, but they seem to be of some other format(have some weird characters in between words like ^@^D). Will fetching those files be the correct way to get the application logs. If there is a different way please let me know. Thanks in advance.

3 REPLIES 3
Highlighted

Re: Collect yarn logs from remote machine

Master Collaborator

Probably it is possible to write a script in python or in bash via curl which will access the JobHistory server URL given the fact you know the application_id, you will collect all the NodeManagers and Container IDs plus you collect the Application Manager container's log.

 

https://<jobhistory>:19890/jobhistory/logs/<nodemanagerN>:8041/<Container ID>/<Attempt ID>/hive/syslog?start=0

But if you are using Kerberos, you have probably secured those endpoints, so you have to authenticate.

Re: Collect yarn logs from remote machine

New Contributor
Great!! This was our preferred approach. The only problem was that there was no way, without timeline server, that we can obtain nodeId and containerId for a given application ID which is either killed/failed/finished. Resourcemanager stores information only for running applications. Since timeline server is still not supported completely by cloudera, and our exisiting customers would not want to install an extra service on demand just for this purpose, we were looking for alternative ways to obtain this info.

Re: Collect yarn logs from remote machine

New Contributor

Could you please explain how to fetch containerIds and node Info which is required by history server without timeline server.