Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

New Contributor
When Yarn is restarted it logins via yarn.service.keytab file but after 10 hours of ticket lifetime only.After that it does not automatically relogin via same key tab file. So a long running job cannot be executed since we get a kinit failure. Is anyone aware of any issue regarding ATS component ?


Yarn restart it is able to login via key tab -
2016-07-06 15:51:37,301 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user yarn/chicago-jumbo@QE.COM using keytab file /etc/security/keytabs/yarn.service.keytab
2016-07-06 15:51:37,305 INFO org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore: Starting EntityGroupFSTimelineStore


After 10 hours it fails ..
2016-07-07 01:51:38,105 ERROR org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore: Error scanning active files
java.io.IOException: Unable to obtain the kerberos principal (did you kinit?)
at com.emc.hadoop.fs.vipr.auth.ClientAuthenticator.getAuthToken(ClientAuthenticator.java:76)
6 REPLIES 6

Re: We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

Mentor

What version of HDP are you on?

Re: We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

Guru

you may hit a bug relative to java version : if you're with java 1.7 you have to stick to 1.7 update 79 or earlier

Re: We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

Super Guru

@Manoj A which version of HDP are you on? I ask since similar issues resolved in 2.7 with yarn-3227

Re: We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

Contributor

I was going to say that the bug sounded familiar - and I see that I commented on that yarn-3227 over a year ago. Glad to hear that bug has been fixed now.

But I also see that the original posted is using Yarn 2.7.1 suggesting it is a different problem. :-(

Re: We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

New Contributor

My HDP version is HDP-2.3.4.0-3485 and Yarn is 2.7.1.2.3 with Java "1.8.0_91".

Re: We are seeing an issue in our kerberized env that AppTimelineServer component of Yarn is unable to relogin via its key tab file after 10 hrs of ticket lifrtime.

New Contributor

We are now seeing this kinit failure for hadoop service principals for hive and Yarn NM too. Is there any resolution to this issue ?

Thanks

Ajay

App logs -

16/07/12 10:46:44 INFO mapreduce.Job: Job job_1468117769609_0085 failed with state FAILED due to: Application application_1468117769609_0085 failed 2 times due to AM Container for appattempt_1468117769609_0085_000002 exited with exitCode: -1000

For more detailed output, check application tracking page:http://lehi-jumbo.ecs.lab.emc.com:8088/cluster/app/application_1468117769609_0085Then, click on links to logs of each attempt.

Diagnostics: java.io.IOException: Unable to obtain the token for service principal vipr/logan-fern.ecs.lab.emc.com@QE.EMC.COM and user nm/layton-jumbo.ecs.lab.emc.com@QE.EMC.COM. Did you kinit?

Failing this attempt. Failing the application.

16/07/12 10:46:44 DEBUG security.UserGroupInfor

====================

NM logs -

2016-07-12 09:26:24,090 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /var/hadoop/yarn/local/nmPrivate/container_e23_1468250014062_0009_02_000001.tokens. Credentials list:

2016-07-12 09:26:24,091 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download resource { { viprfs://kecsbuck.ns.Site1/50GB-terasort-output/_partition.lst, 1468329976375, FILE, null },pending,[(container_e23_1468250014062_0009_02_000001)],711564276569983,DOWNLOADING}

java.io.IOException: Unable to obtain the token for service principal vipr/houston-boysenberry.ecs.lab.emc.com@QE.EMC.COM and user nm/phoenix-jumbo.ecs.lab.emc.com@QE.EMC.COM. Did you kinit?

at com.emc.hadoop.fs.vipr.auth.ClientAuthenticator.getAuthToken(ClientAuthenticator.java:80)

Don't have an account?
Coming from Hortonworks? Activate your account here