Created on 08-24-2018 02:36 AM - edited 09-16-2022 06:37 AM
Hi,
We have to implement a solution to download yarn application logs from a remote cluster machine, the credentials to which are not exposed to us. Is there a way to download logs from that cluster for a given application, considering we cannot install yarn client on our current machine, hence we cannot use "yarn logs -applicationId <appId>" command.
Also I noticed the logs being stored in hdfs under remote app logs dir, but they seem to be of some other format(have some weird characters in between words like ^@^D). Will fetching those files be the correct way to get the application logs. If there is a different way please let me know. Thanks in advance.
Created 08-24-2018 06:14 AM
Probably it is possible to write a script in python or in bash via curl which will access the JobHistory server URL given the fact you know the application_id, you will collect all the NodeManagers and Container IDs plus you collect the Application Manager container's log.
https://<jobhistory>:19890/jobhistory/logs/<nodemanagerN>:8041/<Container ID>/<Attempt ID>/hive/syslog?start=0
But if you are using Kerberos, you have probably secured those endpoints, so you have to authenticate.
Created 08-24-2018 06:32 AM
Created 01-29-2019 11:53 PM
Could you please explain how to fetch containerIds and node Info which is required by history server without timeline server.