- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
In which format are yarn container logs stored in HDFS?
- Labels:
-
Apache Pig
-
Apache YARN
-
HDFS
Created on 10-28-2015 10:37 PM - edited 09-16-2022 02:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I went into /app-logs/<username>/ to get the logs. But I don't see how these files are stored. I tried getting the file and find format using 'file' but it just says 'data'. hdfs dfs -text also just yields garbled text. We are looking to run some pig jobs of container logs to gain some insights.
Created 11-04-2015 11:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar; yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader(); lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*';
This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets
Thanks
Created 10-28-2015 10:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to the Azure blog, the yarn container logs under /app-logs are not directly readable, as they are written in a TFile, binary format indexed by container. Normally one can use the yarn cli tool, it emits the content in the stdout:
yarn logs -applicationId <applicationId
Created 10-29-2015 12:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good pointer on TFile. We can read TFiles. I just loaded it in pig using org.apache.tez.tools.TFileLoader which is in tez-tools (built from source from git)
Created 10-31-2015 12:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you share code or example on loading into pig ?
Created 11-04-2015 11:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
REGISTER /tmp/tez-tfile-parser-0.8.2-SNAPSHOT.jar; yarnlogs = LOAD '/app-logs/hdfs/logs/**/*' USING org.apache.tez.tools.TFileLoader(); lines_with_fetchertime = FILTER yarnlogs BY $2 matches '.*freed by fetcher.*';
This was the code that I used to extract specific text in logs. However, TFileLoader in tez-tools does not seem to scale up that well when we pass a folder with ton on logs. tez-tools I believe is also not part of HDP. You need to build it separately. Worked well on smaller datasets and ran into issues on bigger datasets
Thanks
Created 12-19-2017 10:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This method of using yarn command does not cover the use case of running HDInsight cluster on demand when cluster created to run the pipeline and then deleted. One approach is to use https://github.com/shanyu/hadooplogparser .
Is there any option to configure YARN logger to produce text and not TFile binary format?