We have to implement a solution to download yarn application logs from a remote cluster machine, the credentials to which are not exposed to us. Is there a way to download logs from that cluster for a given application, considering we cannot install yarn client on our current machine, hence we cannot use "yarn logs -applicationId <appId>" command.
Also I noticed the logs being stored in hdfs under remote app logs dir, but they seem to be of some other format(have some weird characters in between words like ^@^D). Will fetching those files be the correct way to get the application logs. If there is a different way please let me know. Thanks in advance.
Probably it is possible to write a script in python or in bash via curl which will access the JobHistory server URL given the fact you know the application_id, you will collect all the NodeManagers and Container IDs plus you collect the Application Manager container's log.
https://<jobhistory>:19890/jobhistory/logs/<nodemanagerN>:8041/<Container ID>/<Attempt ID>/hive/syslog?start=0
But if you are using Kerberos, you have probably secured those endpoints, so you have to authenticate.
Could you please explain how to fetch containerIds and node Info which is required by history server without timeline server.