FWIW I seem to have found a solution. I had added a call to ugi.checkTGTAndReloginFromKeytab() but it hadn't worked. Later in debugging I found that that call was trying to renew the Proxy User, not the underlying principal. I changed the call so that it would get the principal's ugi and call the same method on that and now it seems to work. There are still outstanding questions, though, if anyone cares to investigate further: Why was this only necessary for D.A.R.E. ? All other ops (hdfs, Hive, yarn, etc.) continued working and renewing krbtgt's perpetually Was the upgrade of CDH needed or would it have continued working with the older version?
... View more
I'm running into a similar problem, but only in regards to Data At Rest Encryption (DARE). All other HDFS operations work perpetually and tickets renew as needed. With DARE, everything seems to be set up correctly and works transparently through our app for about an hour, then all we get are "Execution of 'abc.csv' failed. Error details: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)" errors. I thought this might be related to HADOOP-12559 and/or HADOOP-10786 but we upgraded our test environment to CDH 5.8.5 and the problem persists. Manual kinit does not seem to help (and I see valid tickets for our app and for hdfs). Restarting our app seems to reset everything, but I can find no explicit kerberos login that would account for that. My best guess is that there is some principal (possibly HTTP/ourserver.com@REALM.com ?) that needs to renew so that it can validate against the KMS, but doesn't. I tried manually kinit-ing the HTTP principal on the cm server, but to no avail. An alternate possibility is that something else is failing and the tgt error is a red herring, but the timeout aspect inclines me to think it's a kerberos issue. Any help appreciated!!
... View more