- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Workaround for log aggregation bug
- Labels:
-
Apache Ambari
Created ‎02-10-2016 03:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have log aggregation enabled in the Yarn configuration for our cluster (yarn.log-aggregation-enable).
But it doesn't seem to work.
When I try to drill into the history of a job in the resource manager GUI, the link for "logs" always takes me to a page that says: "aggregation is not enabled".
I've opened a ticket asking for help on this, and they told us we need to upgrade, so we did, but it didn't help.
I opened another ticket and am currently waiting for a response.
In the meantime, has anyone seen this?
Is there is a known hack to fix it?
Any advice about where to look for the solution?
We're currently on 2.2.8
Created ‎02-11-2016 06:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zack Riesland , thanks for looking at this with me over webex. It turns out it was the ownership of the mr-history directory that was causing aggregation to not work on the web side, this needed to be owned by mapred and hdfs. The initial failure to start was due to that incorrect class which we fixed.
Created ‎02-10-2016 04:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried to retrieve the logs using the YARN CLI?
yarn logs -applicationId <id of the application>
This will stream back the aggregated log to the screen ... if you have access in HDFS to see the log files. You will see a message about aggregation not being enabled if you lack permissions to see the log files. In that case, modify the command to use the application owner.
yarn logs -appOwner <user id> -applicationId <id of the application>
Created ‎02-10-2016 04:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Terry,
Both of these approaches work - I get back the relevant logs for a given application ID.
But I'm interested in (and tasked with) getting the UI links to work for simplicity of all the folks on our team.
Created ‎02-10-2016 04:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just got a reply from my support ticket and it was literally a link to this thread.
So I guess we better figure it out here!
Created ‎02-10-2016 04:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zack Riesland if you have a dev cluster, try going step by step and enabling TS. Also look for any deprecated properties.
Created ‎02-10-2016 04:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We don't have a dev cluster. I am the only one who needs the cluster today, so I can break stuff as long as it is put back together by tonight's ingest.
It sounds like that's the only way I'm going to get this to work...
Created ‎02-10-2016 04:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zack Riesland you can roll back to old configs when you are done and it doesn't work. Take some standard precautions like backup yarn-site.xml, etc. Definitely post your results here.
Created ‎02-10-2016 04:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
Another followup:
The instructions say:
yarn.timeline-service.entity-group-fs-store.active-dir
and yarn.timeline-service.entity-group-fs-store.done-dir
must exist on the cluster on HDFS. Active-dir should have permission 01777, owned by YARN, group admin-group. Done-dir should have permission 0700, owned by yarn, group admin-group.
2 things:
1) 01777 isn't a valid permission set
2) when it says 'admin-group', does it literally mean that the group should be set to 'admin-group', or just a group with admin privileges? Almost everything in HDFS seems to be in either 'hadoop' or 'hdfs'.
Created ‎02-10-2016 05:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zack Riesland 1. I believe they mean 1777 which is a sticky bit. https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html
2. yes the group with admin privs not literally admin-group.
Created ‎02-10-2016 04:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mark Herring FYI
Created ‎02-11-2016 06:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Zack Riesland , thanks for looking at this with me over webex. It turns out it was the ownership of the mr-history directory that was causing aggregation to not work on the web side, this needed to be owned by mapred and hdfs. The initial failure to start was due to that incorrect class which we fixed.
