Support Questions

Find answers, ask questions, and share your expertise

Hive Metastore Canary error in Cloudera Runtime 7.0.3 - GSS initiate failed

avatar

Dear Community Members, 

 

After kerberization the Hive Metastore instance is showing the following error:

 

[pool-6-thread-79]: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:701) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:698) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_232]
	at javax.security.auth.Subject.doAs(Subject.java:360) ~[?:1.8.0_232]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1856) ~[hadoop-common-3.1.1.7.0.3.0-79.jar:?]
	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:698) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) [hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:199) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	... 10 more

I think the kerberos principal for Hive is correct:

 

image.png

What did I miss? 

Thanks for you help.

 

Regards,

Gabor

7 REPLIES 7

avatar
Guru

Hi @Dombai_Gabor,

 

From the error message, it looks like the Delegation tokens have expired. There is a possibility that HMS is using MemoryTokenStore.

 

Is this a test server? If so, please try below troubleshooting steps.

1. search below configuration from Cloudera Manager:

Hive Metastore Delegation Token Store

2. Set the value to be:

org.apache.hadoop.hive.thrift.DBTokenStore.

3. After the change, save and restart Hive and Hue Services.

4. See if the issue is resolved.

This maybe related to the known bug https://issues.apache.org/jira/browse/HIVE-10574.

 

Hope this helps!

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar

Yes, this is a test Cluster. I've did the steps.

 

image.png

 

It is throwing the same errors:

 

[pool-6-thread-39]: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:701) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:698) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_232]
	at javax.security.auth.Subject.doAs(Subject.java:360) ~[?:1.8.0_232]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1856) ~[hadoop-common-3.1.1.7.0.3.0-79.jar:?]
	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:698) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) [hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed
	at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:199) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ~[hive-exec-3.1.2000.7.0.3.0-79.jar:3.1.2000.7.0.3.0-79]
	... 10 more

Do you have any idea?

 

Thanks,

Gabor

avatar
Super Guru
Hi @Dombai_Gabor,

Can you check the keytab file under /var/run/cloudera-scm-agent/process/*HIVEMETASTORE* directories are valid and you can kinit using those keytab files without issues?

Also, does the error happen on start up or after running for a while?

Thanks
Eric

avatar

Hi @EricL !

 

Yes, I can init the hive keytab file from the mentioned folder without problem. 

 

After running for approxometly 2-3 minutes.

 

Thanks.
Gabor

avatar

Hi @EricL 

 

So, do you have any idea? 🙂

 

Thanks.

Gábor

avatar
Super Guru
Hi Gabor,

Can you share the krb5.conf file on the HMS host for review?

Cheers
Eric

avatar

Hello @EricL 

 

Of course!

 

[libdefaults]
default_realm = DOMAIN.LOCAL
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = rc4-hmac aes128-cts aes256-cts des-cbc-crc des-cbc-md5
default_tkt_enctypes = rc4-hmac aes128-cts aes256-cts des-cbc-crc des-cbc-md5
permitted_enctypes = rc4-hmac aes128-cts aes256-cts des-cbc-crc des-cbc-md5
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
DOMAIN.LOCAL = {
kdc = d20-ad01.domain.local
admin_server = d20-ad01.domain.local
kdc = d20-ad01.domain.local
admin_server = d20-ad01.domain.local
}
[domain_realm]
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log