Member since
06-16-2017
8
Posts
0
Kudos Received
0
Solutions
06-22-2017
07:24 AM
Turned out to be a silly mistake - this line had the clue - Login failure for nm/node28.my.com@HDP from keytab nm.service.keytab. The keytab's absolute path should be in the message. In ambari blueprint, node manager keytab configuration did not have the absolute path which is - /etc/security/keytabs/nm.service.keytab
... View more
06-22-2017
01:32 AM
I see the following error in NodeManager startup log org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed NodeManager login
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:300)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:547)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
Caused by: java.io.IOException: Login failure for nm/node28.my.com@HDP from keytab nm.service.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1098)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:307)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:271)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.doSecureLogin(NodeManager.java:156)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:298)
... 3 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user
at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1089)
... 7 more
but when I do kinit -kt /etc/security/keytabs/nm.service.keytab nm/node28.my.com@HDP
and then
klist
Ticket cache: FILE:/tmp/krb5cc_1011
Default principal: nm/node28.my.com@HDP
Valid starting Expires Service principal
06/21/2017 18:30:06 06/22/2017 18:30:06 krbtgt/HDP@HDP
Unable to figure out why Nodemanager fails with that exception!
... View more
Labels:
06-19-2017
07:00 AM
This could probably be because when installing with ambari blueprint, zookeeper server_jvmflags does not have the following system property - SERVER_JVMFLAGS="-Djava.security.auth.login.config=/path/to/zookeeper_jaas.conf"
Going to modify the blueprint and try again.
... View more
06-19-2017
12:21 AM
When installing using Ambari blueprint, the ambari_infra_solr installation fails most of the times with the following error - zkClient received AuthFailed
Error updating path [/clusterprops.json]
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311)
at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
at org.apache.solr.common.cloud.ZkStateReader.setClusterProperty(ZkStateReader.java:738)
at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:37)
at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:26)
at org.apache.ambari.logsearch.solr.commands.AbstractZookeeperRetryCommand.createAndProcessRequest(AbstractZookeeperRetryCommand.java:38)
at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.retry(AbstractRetryCommand.java:45)
at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.run(AbstractRetryCommand.java:40)
at org.apache.ambari.logsearch.solr.AmbariSolrCloudClient.setClusterProp(AmbariSolrCloudClient.java:127)
at org.apache.ambari.logsearch.solr.AmbariSolrCloudCLI.main(AmbariSolrCloudCLI.java:502) Note: (a) We do not have these hosts in the DNS entries. We are just updating the /etc/hosts file on all VMs to try out blueprint installation. So, nslookup testnode22.dc.somecompany.com will give no entry. (b) This problem started happening couple of days back, (before that, installation succeeded most of the times) In the last 2 days however, it failed 9 times of the 10 times I tried with the exact same exception. The only time it succeeded was when I deleted the principal from kerberos database - but that was a fluke. I tried to repeat the process, but, it didn't work. The complete relevant log including kerberos log is below - On node testnode22.dc.somecompany.com $ klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab
Keytab name: FILE:/etc/security/keytabs/ambari-infra-solr.service.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
$ cat /etc/ambari-infra-solr/conf/infra_solr_jaas.conf
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
useTicketCache=false
keyTab="/etc/security/keytabs/ambari-infra-solr.service.keytab"
principal="infra-solr/testnode22.dc.somecompany.com@HDP";
};
On node mykdc.dc.somecompany.com $ kadmin.local: get_principal infra-solr/testnode22.dc.somecompany.com@HDP
Principal: infra-solr/testnode22.dc.somecompany.com@HDP
Expiration date: [never]
Last password change: Fri Jun 16 14:17:28 MST 2017
Password expiration date: [none]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Fri Jun 16 14:17:28 MST 2017 (admin/admin@HDP)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 6
Key: vno 1, aes256-cts-hmac-sha1-96
Key: vno 1, aes128-cts-hmac-sha1-96
Key: vno 1, des3-cbc-sha1
Key: vno 1, arcfour-hmac
Key: vno 1, des-hmac-sha1
Key: vno 1, des-cbc-md5
MKey: vno 1
Attributes:
Policy: [none]
Finally, the relevant installation log - Initiating client connection, connectString=testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr sessionTimeout=15000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@512ddf17
Waiting for client to connect to ZooKeeper
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 81; type: 16
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 73; type: 17
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 65; type: 3
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 73; type: 23
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 89; type: 18
Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
Added key: 18version: 26
Added key: 23version: 26
Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 17version: 26
Added key: 16version: 26
>>> KdcAccessibility: reset
Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 18version: 26
Added key: 23version: 26
Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 17version: 26
Added key: 16version: 26
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000, number of retries =3, #bytes=157
>>> KDCCommunication: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000,Attempt =1, #bytes=157
>>> KrbKdcReq send: #bytes read=686
>>> KdcAccessibility: remove mykdc.dc.somecompany.com
Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 18version: 26
Added key: 23version: 26
Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 17version: 26
Added key: 16version: 26
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply infra-solr/testnode22.dc.somecompany.com
successfully logged in.
TGT refresh thread started.
Client will use GSSAPI as SASL mechanism.
TGT valid starting at: Fri Jun 16 00:37:02 MST 2017
TGT expires: Sat Jun 17 00:37:02 MST 2017
TGT refresh sleeping until: Fri Jun 16 20:11:03 MST 2017
Opening socket connection to server testnode21.dc.somecompany.com/10.17.104.51:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
Socket connection established to testnode21.dc.somecompany.com/10.17.104.51:2181, initiating session
Session establishment complete on server testnode21.dc.somecompany.com/10.17.104.51:2181, sessionid = 0x15cafb160940006, negotiated timeout = 15000
Found ticket for infra-solr/testnode22.dc.somecompany.com@HDP to go to krbtgt/HDP@HDP expiring on Sat Jun 17 00:37:02 MST 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for infra-solr/testnode22.dc.somecompany.com@HDP to go to krbtgt/HDP@HDP expiring on Sat Jun 17 00:37:02 MST 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
Watcher org.apache.solr.common.cloud.ConnectionManager@187aa366 name:ZooKeeperConnection Watcher:testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
Client is connected to ZooKeeper
Using default ZkACLProvider
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbKdcReq send: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000, number of retries =3, #bytes=681
>>> KDCCommunication: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000,Attempt =1, #bytes=681
Set cluster prop: 'urlScheme'
>>> KrbKdcReq send: #bytes read=706
>>> KdcAccessibility: remove mykdc.dc.somecompany.com
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 536186913
Krb5Context setting peerSeqNumber to: 0
Created InitSecContextToken:
0000: 01 00 6E 82 02 75 30 82 02 71 A0 03 02 01 05 A1 ..n..u0..q......
0010: 03 02 01 0E A2 07 03 05 00 00 00 00 00 A3 82 01 ................
…Some bunch of similar lines as above…
…Some bunch of similar lines as above…
…Some bunch of similar lines as above…
0260: 5D AC 0E EB 5B 9C ED A9 2E FD 17 EB E0 1D B6 C1 ]...[...........
0270: BF 79 AF C1 EB 19 1E 41 95 8E 8F .y.....A...
SASL authentication failed using login context 'Client'.
SASL authentication failed using login context 'Client'.
Watcher org.apache.solr.common.cloud.ConnectionManager@187aa366 name:ZooKeeperConnection Watcher:testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr got event WatchedEvent state:AuthFailed type:None path:null path:null type:None
zkClient received AuthFailed
Error updating path [/clusterprops.json]
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311)
at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
at org.apache.solr.common.cloud.ZkStateReader.setClusterProperty(ZkStateReader.java:738)
at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:37)
at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:26)
at org.apache.ambari.logsearch.solr.commands.AbstractZookeeperRetryCommand.createAndProcessRequest(AbstractZookeeperRetryCommand.java:38)
at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.retry(AbstractRetryCommand.java:45)
at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.run(AbstractRetryCommand.java:40)
at org.apache.ambari.logsearch.solr.AmbariSolrCloudClient.setClusterProp(AmbariSolrCloudClient.java:127)
at org.apache.ambari.logsearch.solr.AmbariSolrCloudCLI.main(AmbariSolrCloudCLI.java:502)
On the Zookeeper server log, I see the following line - 2017-06-17 23:59:04,508 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@981] - cnxn.saslServer is null: cnxn object did not initialize its saslServer properly.
2017-06-17 23:59:15,000 - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x15cb2c4b7070016, timeout of 30000ms exceeded
2017-06-17 23:59:15,000 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x15cb2c4b7070016
2017-06-17 23:59:24,530 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x35cb2c4bac6000f, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
06-16-2017
10:40 PM
well, that was not the solution. created a question here
... View more
06-16-2017
05:34 PM
In my case I was testing different blueprints so I was installing on the same hosts over and over again. This error started coming up after 3 days and after deleting the principal from kerberos database (and allowing ambari to recreate it during install) this step passed. There was another error of course... but thats for a different post 🙂
... View more