Support Questions

Find answers, ask questions, and share your expertise

Ambari Metrics not responding periodically

avatar

Hi,

we have a Kerberos secured cluster and currently facing issues with Ambari Metrics.
After starting Ambari Metrics everythin is fine but after a couple of days we get alerts from Ambari like this:

NameNode Service RPC Processing Latency (Hourly)
Unable to retrieve metrics from the Ambari Metrics service.

When I check the logs oft he Metrics Collector I can find entries like:

2018-03-28 11:19:47,013 WARN org.apache.hadoop.security.UserGroupInformation: Exception encountered while running the renewal command for amshbase/s0202.cl.psiori.com@PSIORI.COM.
(TGT end time:1522228847000, renewalFailures: 
	org.apache.hadoop.metrics2.lib.MutableGaugeInt@388f50cd,renewalFailuresTotal:
	org.apache.hadoop.metrics2.lib.MutableGaugeLong@7d8dc9b8)
ExitCodeException exitCode=1: kinit: KDC can't fulfill requested option while renewing credentials
at org.apache.hadoop.util.Shell.runCommand(Shell.java:954)
at org.apache.hadoop.util.Shell.run(Shell.java:855)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1163)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1257)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1239)
at org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:987)
at java.lang.Thread.run(Thread.java:745)
2018-03-28 11:19:47,014 ERROR org.apache.hadoop.security.UserGroupInformation: TGT is expired. Aborting renew thread for amshbase/s0202.cl.psiori.com@PSIORI.COM.

In the following I then see aggregation errors:

2018-03-28 11:27:08,188 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Wed Mar 28 11:27:08 CEST 2018
2018-03-28 11:27:08,189 INFO TimelineClusterAggregatorMinute: Skipping aggregation function not owned by this instance.
2018-03-28 11:27:08,205 ERROR TimelineMetricHostAggregatorHourly: Exception during aggregating metrics.
	java.sql.SQLTimeoutException: Operation timed out.
at org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:364)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:831)

So this seems to be related to Kerberos. When I check the log oft he KDC there is not much info:

Mar 28 11:19:47 sql.cl.psiori.com krb5kdc[879](info): TGS_REQ (8 etypes {18 17 20 19
16 23 25 26}) 10.11.1.21: TICKET NOT RENEWABLE: authtime 0,  
	amshbase/s0202.cl.psiori.com@PSIORI.COM
	for krbtgt/PSIORI.COM@PSIORI.COM,
KDC can't fulfill requested option
...
Mar 28 11:20:48 sql.cl.psiori.com krb5kdc[879](info): AS_REQ (4 etypes {18 17 16 23}) 10.11.1.21: ISSUE: authtime 1522228848, etypes {rep=18 tkt=18 ses=18}, amshbase/s0202.cl.psiori.com@PSIORI.COM for krbtgt/PSIORI.COM@PSIORI.COM
Mar 28 11:20:48 sql.cl.psiori.com krb5kdc[879](info): TGS_REQ (4 etypes {18 17 16 23}) 10.11.1.21: ISSUE: authtime 1522228848, etypes {rep=18 tkt=18 ses=18}, amshbase/s0202.cl.psiori.com@PSIORI.COM for nn/m0201.cl.psiori.com@PSIORI.COM

When I check the principal amshbase/s0202.cl.psiori.com@PSIORI.COM in the KDC I get the following:

Principal: amshbase/s0202.cl.psiori.com@PSIORI.COM
Expiration date: [never]
Last password change: Mo Mär 19 11:24:05 CET 2018
Password expiration date: [never]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Mo Mär 19 11:24:05 CET 2018 (admin/admin@PSIORI.COM)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 2
Key: vno 1, aes256-cts-hmac-sha1-96
Key: vno 1, aes128-cts-hmac-sha1-96
MKey: vno 1
Attributes:
Policy: [none]

Ist hat normal? Maximum renewable life is set to 0 so ticket renewal is not possible. But that is also true for all other principals in the KDC and all other services work normally.

This is the content of krb5.conf:

[libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = PSIORI.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
default_ccache_name = /tmp/krb5cc_%{uid}
#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5

[domain_realm]
.cl.psiori.com = PSIORI.COM
cl.psiori.com = PSIORI.COM

[logging]
default = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
kdc = FILE:/var/log/krb5kdc.log

[realms]
PSIORI.COM = {
admin_server = sql.cl.psiori.com
kdc = sql.cl.psiori.com
}

I have not applied any changes to the kdc.conf so it has the default content:

[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88

[realms]
EXAMPLE.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal
arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal
des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
}

Is there any misconfiguration?
Unfortunately the Hortonworks installation docu doesn't give detailed information about how to configure Kerberos KDC correctly, it just forwards to the official MIT KDC docu.

When I restart the service then everything is fine again (for some time).

Any suggestions or help is very welcome.

Best regards,
Alex

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Alexander Schätzle


As the error says that "TICKET NOT RENEWABLE". Which can happen if the Principal is not having the Renewable attribute.
Please check the principal attributes something like this, using "admin.local" utility

kadmin: getprinc <PRINCIPAL> 


Also if you find that it is not having the renewable attributes to it then please modify the principle and add the renewable flags to is something like following:

# modprinc -maxrenewlife "7 days" +allow_renewable krbtgt/XXXX.COM@XXXX.COM
# modprinc -maxrenewlife "7 days" +allow_renewable "amshbase/s0202.cl.xxxx.com@XXXX.COM"

.

Please use the correct principal names. I have used the masked principal name in the above sample commands.


After executing the above commands to modify the principals please run the "kinit -R" to see if its still throwing error while renewing ?

.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Alexander Schätzle


As the error says that "TICKET NOT RENEWABLE". Which can happen if the Principal is not having the Renewable attribute.
Please check the principal attributes something like this, using "admin.local" utility

kadmin: getprinc <PRINCIPAL> 


Also if you find that it is not having the renewable attributes to it then please modify the principle and add the renewable flags to is something like following:

# modprinc -maxrenewlife "7 days" +allow_renewable krbtgt/XXXX.COM@XXXX.COM
# modprinc -maxrenewlife "7 days" +allow_renewable "amshbase/s0202.cl.xxxx.com@XXXX.COM"

.

Please use the correct principal names. I have used the masked principal name in the above sample commands.


After executing the above commands to modify the principals please run the "kinit -R" to see if its still throwing error while renewing ?

.

avatar

I modified the principals such that they can issue renewable tickets. I don't get errors now when renewing the ticket.

But I'm wondering why this is necessary at all? None of the principals in the KDC can issue renewable tickets and all other services work fine. If a ticket is not renewable, the service could simply request a new ticket. Or do I misunderstand something here?

avatar
Master Mentor

@Alexander Schätzle

Also please check your KDC configuration to verify if it has default setting something like following or not?

# cat /var/kerberos/krb5kdc/kdc.conf

max_life = 24h 0m 0s
max_renewable_life = 7d 0h 0m 0s
default_principal_flags = +renewable, +forwardable

.

avatar

I updated the KDC configuration. But I had to create a realm definiton in kdc.conf as well under [realms], just putting the configuration values under [kdcdefaults] didn't help.

But still, I'm confused why this is necessary at all. Why does the Metrics Collector not simply issue a new ticket instead of renewing it?