Created on 10-28-2015 10:37 PM - edited 09-16-2022 02:46 AM
I went into /app-logs/<username>/ to get the logs. But I don't see how these files are stored. I tried getting the file and find format using 'file' but it just says 'data'. hdfs dfs -text also just yields garbled text. We are looking to run some pig jobs of container logs to gain some insights.
Created 11-04-2015 11:05 PM
REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar; yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader(); lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*';
This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets
Thanks
Created 10-28-2015 10:55 PM
According to the Azure blog, the yarn container logs under /app-logs are not directly readable, as they are written in a TFile, binary format indexed by container. Normally one can use the yarn cli tool, it emits the content in the stdout:
yarn logs -applicationId <applicationId
Created 10-29-2015 12:16 AM
Good pointer on TFile. We can read TFiles. I just loaded it in pig using org.apache.tez.tools.TFileLoader which is in tez-tools (built from source from git)
Created 10-31-2015 12:53 PM
Could you share code or example on loading into pig ?
Created 11-04-2015 11:05 PM
REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar; yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader(); lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*';
This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets
Thanks
Created 12-19-2017 10:43 AM
This method of using yarn command does not cover the use case of running HDInsight cluster on demand when cluster created to run the pipeline and then deleted. One approach is to use https://github.com/shanyu/hadooplogparser .
Is there any option to configure YARN logger to produce text and not TFile binary format?