Support Questions

ravi1 · ‎10-28-2015

I went into /app-logs/<username>/ to get the logs. But I don't see how these files are stored. I tried getting the file and find format using 'file' but it just says 'data'. hdfs dfs -text also just yields garbled text. We are looking to run some pig jobs of container logs to gain some insights.

ravi1 · ‎11-04-2015

REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar;
yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader();
lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*';

This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets

Thanks

View solution in original post

deepesh1 · ‎10-28-2015

According to the Azure blog, the yarn container logs under /app-logs are not directly readable, as they are written in a TFile, binary format indexed by container. Normally one can use the yarn cli tool, it emits the content in the stdout:

yarn logs -applicationId <applicationId

ravi1 · ‎10-29-2015

Good pointer on TFile. We can read TFiles. I just loaded it in pig using org.apache.tez.tools.TFileLoader which is in tez-tools (built from source from git)

nsabharwal · ‎10-31-2015

Could you share code or example on loading into pig ?

ravi1 · ‎11-04-2015

REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar;
yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader();
lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*';

This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets

Thanks

david_greenshte · ‎12-19-2017

This method of using yarn command does not cover the use case of running HDInsight cluster on demand when cluster created to run the pipeline and then deleted. One approach is to use https://github.com/shanyu/hadooplogparser .

Is there any option to configure YARN logger to produce text and not TFile binary format?

Cloudera Community

Support Questions

In which format are yarn container logs stored in HDFS?