Support Questions

zack_riesland · ‎02-10-2016

We have log aggregation enabled in the Yarn configuration for our cluster (yarn.log-aggregation-enable).

But it doesn't seem to work.

When I try to drill into the history of a job in the resource manager GUI, the link for "logs" always takes me to a page that says: "aggregation is not enabled".

I've opened a ticket asking for help on this, and they told us we need to upgrade, so we did, but it didn't help.

I opened another ticket and am currently waiting for a response.

In the meantime, has anyone seen this?

Is there is a known hack to fix it?

Any advice about where to look for the solution?

We're currently on 2.2.8

iroberts · ‎02-11-2016

@Zack Riesland , thanks for looking at this with me over webex. It turns out it was the ownership of the mr-history directory that was causing aggregation to not work on the web side, this needed to be owned by mapred and hdfs. The initial failure to start was due to that incorrect class which we fixed.

View solution in original post

TerryP · ‎02-10-2016

Have you tried to retrieve the logs using the YARN CLI?

yarn logs -applicationId <id of the application>

This will stream back the aggregated log to the screen ... if you have access in HDFS to see the log files. You will see a message about aggregation not being enabled if you lack permissions to see the log files. In that case, modify the command to use the application owner.

yarn logs -appOwner <user id> -applicationId <id of the application>

zack_riesland · ‎02-10-2016

Thanks Terry,

Both of these approaches work - I get back the relevant logs for a given application ID.

But I'm interested in (and tasked with) getting the UI links to work for simplicity of all the folks on our team.

zack_riesland · ‎02-10-2016

I just got a reply from my support ticket and it was literally a link to this thread.

So I guess we better figure it out here!

aervits · ‎02-10-2016

@Zack Riesland if you have a dev cluster, try going step by step and enabling TS. Also look for any deprecated properties.

zack_riesland · ‎02-10-2016

We don't have a dev cluster. I am the only one who needs the cluster today, so I can break stuff as long as it is put back together by tonight's ingest.

It sounds like that's the only way I'm going to get this to work...

aervits · ‎02-10-2016

@Zack Riesland you can roll back to old configs when you are done and it doesn't work. Take some standard precautions like backup yarn-site.xml, etc. Definitely post your results here.

zack_riesland · ‎02-10-2016

Thanks,

Another followup:

The instructions say:

yarn.timeline-service.entity-group-fs-store.active-dir and yarn.timeline-service.entity-group-fs-store.done-dir must exist on the cluster on HDFS. Active-dir should have permission 01777, owned by YARN, group admin-group. Done-dir should have permission 0700, owned by yarn, group admin-group.

2 things:

1) 01777 isn't a valid permission set

2) when it says 'admin-group', does it literally mean that the group should be set to 'admin-group', or just a group with admin privileges? Almost everything in HDFS seems to be in either 'hadoop' or 'hdfs'.

aervits · ‎02-10-2016

@Zack Riesland 1. I believe they mean 1777 which is a sticky bit. https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html

2. yes the group with admin privs not literally admin-group.

aervits · ‎02-10-2016

@Mark Herring FYI

iroberts · ‎02-11-2016

@Zack Riesland , thanks for looking at this with me over webex. It turns out it was the ownership of the mr-history directory that was causing aggregation to not work on the web side, this needed to be owned by mapred and hdfs. The initial failure to start was due to that incorrect class which we fixed.

Cloudera Community

Support Questions

Workaround for log aggregation bug