Created 06-19-2017 12:21 AM
When installing using Ambari blueprint, the ambari_infra_solr installation fails most of the times with the following error -
zkClient received AuthFailed Error updating path [/clusterprops.json] org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308) at org.apache.solr.common.cloud.ZkStateReader.setClusterProperty(ZkStateReader.java:738) at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:37) at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:26) at org.apache.ambari.logsearch.solr.commands.AbstractZookeeperRetryCommand.createAndProcessRequest(AbstractZookeeperRetryCommand.java:38) at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.retry(AbstractRetryCommand.java:45) at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.run(AbstractRetryCommand.java:40) at org.apache.ambari.logsearch.solr.AmbariSolrCloudClient.setClusterProp(AmbariSolrCloudClient.java:127) at org.apache.ambari.logsearch.solr.AmbariSolrCloudCLI.main(AmbariSolrCloudCLI.java:502)
Note:
(a) We do not have these hosts in the DNS entries. We are just updating the /etc/hosts file on all VMs to try out blueprint installation. So, nslookup testnode22.dc.somecompany.com will give no entry.
(b) This problem started happening couple of days back, (before that, installation succeeded most of the times) In the last 2 days however, it failed 9 times of the 10 times I tried with the exact same exception. The only time it succeeded was when I deleted the principal from kerberos database - but that was a fluke. I tried to repeat the process, but, it didn't work.
The complete relevant log including kerberos log is below -
On node testnode22.dc.somecompany.com
$ klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab Keytab name: FILE:/etc/security/keytabs/ambari-infra-solr.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP 1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP 1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP 1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP 1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
$ cat /etc/ambari-infra-solr/conf/infra_solr_jaas.conf Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/ambari-infra-solr.service.keytab" principal="infra-solr/testnode22.dc.somecompany.com@HDP"; };
On node mykdc.dc.somecompany.com
$ kadmin.local: get_principal infra-solr/testnode22.dc.somecompany.com@HDP Principal: infra-solr/testnode22.dc.somecompany.com@HDP Expiration date: [never] Last password change: Fri Jun 16 14:17:28 MST 2017 Password expiration date: [none] Maximum ticket life: 1 day 00:00:00 Maximum renewable life: 0 days 00:00:00 Last modified: Fri Jun 16 14:17:28 MST 2017 (admin/admin@HDP) Last successful authentication: [never] Last failed authentication: [never] Failed password attempts: 0 Number of keys: 6 Key: vno 1, aes256-cts-hmac-sha1-96 Key: vno 1, aes128-cts-hmac-sha1-96 Key: vno 1, des3-cbc-sha1 Key: vno 1, arcfour-hmac Key: vno 1, des-hmac-sha1 Key: vno 1, des-cbc-md5 MKey: vno 1 Attributes: Policy: [none]
Finally, the relevant installation log -
Initiating client connection, connectString=testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr sessionTimeout=15000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@512ddf17 Waiting for client to connect to ZooKeeper >>> KeyTabInputStream, readName(): HDP >>> KeyTabInputStream, readName(): infra-solr >>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com >>> KeyTab: load() entry length: 81; type: 16 >>> KeyTabInputStream, readName(): HDP >>> KeyTabInputStream, readName(): infra-solr >>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com >>> KeyTab: load() entry length: 73; type: 17 >>> KeyTabInputStream, readName(): HDP >>> KeyTabInputStream, readName(): infra-solr >>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com >>> KeyTab: load() entry length: 65; type: 3 >>> KeyTabInputStream, readName(): HDP >>> KeyTabInputStream, readName(): infra-solr >>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com >>> KeyTab: load() entry length: 73; type: 23 >>> KeyTabInputStream, readName(): HDP >>> KeyTabInputStream, readName(): infra-solr >>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com >>> KeyTab: load() entry length: 89; type: 18 Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP Java config name: null Native config name: /etc/krb5.conf Loaded from native config Added key: 18version: 26 Added key: 23version: 26 Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP Added key: 17version: 26 Added key: 16version: 26 >>> KdcAccessibility: reset Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP Added key: 18version: 26 Added key: 23version: 26 Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP Added key: 17version: 26 Added key: 16version: 26 Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 18 17 16 23. >>> KrbAsReq creating message >>> KrbKdcReq send: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000, number of retries =3, #bytes=157 >>> KDCCommunication: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000,Attempt =1, #bytes=157 >>> KrbKdcReq send: #bytes read=686 >>> KdcAccessibility: remove mykdc.dc.somecompany.com Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP Added key: 18version: 26 Added key: 23version: 26 Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP Added key: 17version: 26 Added key: 16version: 26 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbAsRep cons in KrbAsReq.getReply infra-solr/testnode22.dc.somecompany.com successfully logged in. TGT refresh thread started. Client will use GSSAPI as SASL mechanism. TGT valid starting at: Fri Jun 16 00:37:02 MST 2017 TGT expires: Sat Jun 17 00:37:02 MST 2017 TGT refresh sleeping until: Fri Jun 16 20:11:03 MST 2017 Opening socket connection to server testnode21.dc.somecompany.com/10.17.104.51:2181. Will attempt to SASL-authenticate using Login Context section 'Client' Socket connection established to testnode21.dc.somecompany.com/10.17.104.51:2181, initiating session Session establishment complete on server testnode21.dc.somecompany.com/10.17.104.51:2181, sessionid = 0x15cafb160940006, negotiated timeout = 15000 Found ticket for infra-solr/testnode22.dc.somecompany.com@HDP to go to krbtgt/HDP@HDP expiring on Sat Jun 17 00:37:02 MST 2017 Entered Krb5Context.initSecContext with state=STATE_NEW Found ticket for infra-solr/testnode22.dc.somecompany.com@HDP to go to krbtgt/HDP@HDP expiring on Sat Jun 17 00:37:02 MST 2017 Service ticket not found in the subject >>> Credentials acquireServiceCreds: same realm Using builtin default etypes for default_tgs_enctypes default etypes for default_tgs_enctypes: 18 17 16 23. >>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType Watcher org.apache.solr.common.cloud.ConnectionManager@187aa366 name:ZooKeeperConnection Watcher:testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None Client is connected to ZooKeeper Using default ZkACLProvider >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbKdcReq send: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000, number of retries =3, #bytes=681 >>> KDCCommunication: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000,Attempt =1, #bytes=681 Set cluster prop: 'urlScheme' >>> KrbKdcReq send: #bytes read=706 >>> KdcAccessibility: remove mykdc.dc.somecompany.com >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType >>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000 >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType Krb5Context setting mySeqNumber to: 536186913 Krb5Context setting peerSeqNumber to: 0 Created InitSecContextToken: 0000: 01 00 6E 82 02 75 30 82 02 71 A0 03 02 01 05 A1 ..n..u0..q...... 0010: 03 02 01 0E A2 07 03 05 00 00 00 00 00 A3 82 01 ................ …Some bunch of similar lines as above… …Some bunch of similar lines as above… …Some bunch of similar lines as above… 0260: 5D AC 0E EB 5B 9C ED A9 2E FD 17 EB E0 1D B6 C1 ]...[........... 0270: BF 79 AF C1 EB 19 1E 41 95 8E 8F .y.....A... SASL authentication failed using login context 'Client'. SASL authentication failed using login context 'Client'. Watcher org.apache.solr.common.cloud.ConnectionManager@187aa366 name:ZooKeeperConnection Watcher:testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr got event WatchedEvent state:AuthFailed type:None path:null path:null type:None zkClient received AuthFailed Error updating path [/clusterprops.json] org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311) at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308) at org.apache.solr.common.cloud.ZkStateReader.setClusterProperty(ZkStateReader.java:738) at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:37) at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:26) at org.apache.ambari.logsearch.solr.commands.AbstractZookeeperRetryCommand.createAndProcessRequest(AbstractZookeeperRetryCommand.java:38) at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.retry(AbstractRetryCommand.java:45) at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.run(AbstractRetryCommand.java:40) at org.apache.ambari.logsearch.solr.AmbariSolrCloudClient.setClusterProp(AmbariSolrCloudClient.java:127) at org.apache.ambari.logsearch.solr.AmbariSolrCloudCLI.main(AmbariSolrCloudCLI.java:502)
On the Zookeeper server log, I see the following line -
2017-06-17 23:59:04,508 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@981] - cnxn.saslServer is null: cnxn object did not initialize its saslServer properly. 2017-06-17 23:59:15,000 - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x15cb2c4b7070016, timeout of 30000ms exceeded 2017-06-17 23:59:15,000 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x15cb2c4b7070016 2017-06-17 23:59:24,530 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x35cb2c4bac6000f, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745)
Created 06-19-2017 07:00 AM
This could probably be because when installing with ambari blueprint, zookeeper server_jvmflags does not have the following system property -
SERVER_JVMFLAGS="-Djava.security.auth.login.config=/path/to/zookeeper_jaas.conf"
Going to modify the blueprint and try again.