Support Questions

Find answers, ask questions, and share your expertise

cnxn.saslServer is null: cnxn object did not initialize its saslServer properly when installing ambari-infra-solr

avatar
Explorer

When installing using Ambari blueprint, the ambari_infra_solr installation fails most of the times with the following error -

zkClient received AuthFailed
Error updating path [/clusterprops.json]
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
	at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311)
	at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308)
	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
	at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
	at org.apache.solr.common.cloud.ZkStateReader.setClusterProperty(ZkStateReader.java:738)
	at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:37)
	at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:26)
	at org.apache.ambari.logsearch.solr.commands.AbstractZookeeperRetryCommand.createAndProcessRequest(AbstractZookeeperRetryCommand.java:38)
	at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.retry(AbstractRetryCommand.java:45)
	at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.run(AbstractRetryCommand.java:40)
	at org.apache.ambari.logsearch.solr.AmbariSolrCloudClient.setClusterProp(AmbariSolrCloudClient.java:127)
	at org.apache.ambari.logsearch.solr.AmbariSolrCloudCLI.main(AmbariSolrCloudCLI.java:502)

Note:

(a) We do not have these hosts in the DNS entries. We are just updating the /etc/hosts file on all VMs to try out blueprint installation. So, nslookup testnode22.dc.somecompany.com will give no entry.

(b) This problem started happening couple of days back, (before that, installation succeeded most of the times) In the last 2 days however, it failed 9 times of the 10 times I tried with the exact same exception. The only time it succeeded was when I deleted the principal from kerberos database - but that was a fluke. I tried to repeat the process, but, it didn't work.

The complete relevant log including kerberos log is below -

On node testnode22.dc.somecompany.com

$ klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab
Keytab name: FILE:/etc/security/keytabs/ambari-infra-solr.service.keytab
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
   1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
   1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
   1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP
   1 06/16/2017 14:17:31 infra-solr/testnode22.dc.somecompany.com@HDP


$ cat /etc/ambari-infra-solr/conf/infra_solr_jaas.conf

Client {
 com.sun.security.auth.module.Krb5LoginModule required
 useKeyTab=true
 storeKey=true
 useTicketCache=false
 keyTab="/etc/security/keytabs/ambari-infra-solr.service.keytab"
 principal="infra-solr/testnode22.dc.somecompany.com@HDP";
};

On node mykdc.dc.somecompany.com

$ kadmin.local:  get_principal infra-solr/testnode22.dc.somecompany.com@HDP
Principal: infra-solr/testnode22.dc.somecompany.com@HDP
Expiration date: [never]
Last password change: Fri Jun 16 14:17:28 MST 2017
Password expiration date: [none]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Fri Jun 16 14:17:28 MST 2017 (admin/admin@HDP)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 6
Key: vno 1, aes256-cts-hmac-sha1-96
Key: vno 1, aes128-cts-hmac-sha1-96
Key: vno 1, des3-cbc-sha1
Key: vno 1, arcfour-hmac
Key: vno 1, des-hmac-sha1
Key: vno 1, des-cbc-md5
MKey: vno 1
Attributes:
Policy: [none]

Finally, the relevant installation log -

Initiating client connection, connectString=testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr sessionTimeout=15000 watcher=org.apache.solr.common.cloud.SolrZkClient$3@512ddf17
Waiting for client to connect to ZooKeeper
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 81; type: 16
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 73; type: 17
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 65; type: 3
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 73; type: 23
>>> KeyTabInputStream, readName(): HDP
>>> KeyTabInputStream, readName(): infra-solr
>>> KeyTabInputStream, readName(): testnode22.dc.somecompany.com
>>> KeyTab: load() entry length: 89; type: 18
Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
Added key: 18version: 26
Added key: 23version: 26
Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 17version: 26
Added key: 16version: 26
>>> KdcAccessibility: reset
Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 18version: 26
Added key: 23version: 26
Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 17version: 26
Added key: 16version: 26
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000, number of retries =3, #bytes=157
>>> KDCCommunication: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000,Attempt =1, #bytes=157
>>> KrbKdcReq send: #bytes read=686
>>> KdcAccessibility: remove mykdc.dc.somecompany.com
Looking for keys for: infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 18version: 26
Added key: 23version: 26
Found unsupported keytype (3) for infra-solr/testnode22.dc.somecompany.com@HDP
Added key: 17version: 26
Added key: 16version: 26
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply infra-solr/testnode22.dc.somecompany.com
successfully logged in.
TGT refresh thread started.
Client will use GSSAPI as SASL mechanism.
TGT valid starting at:        Fri Jun 16 00:37:02 MST 2017
TGT expires:                  Sat Jun 17 00:37:02 MST 2017
TGT refresh sleeping until: Fri Jun 16 20:11:03 MST 2017
Opening socket connection to server testnode21.dc.somecompany.com/10.17.104.51:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
Socket connection established to testnode21.dc.somecompany.com/10.17.104.51:2181, initiating session
Session establishment complete on server testnode21.dc.somecompany.com/10.17.104.51:2181, sessionid = 0x15cafb160940006, negotiated timeout = 15000
Found ticket for infra-solr/testnode22.dc.somecompany.com@HDP to go to krbtgt/HDP@HDP expiring on Sat Jun 17 00:37:02 MST 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for infra-solr/testnode22.dc.somecompany.com@HDP to go to krbtgt/HDP@HDP expiring on Sat Jun 17 00:37:02 MST 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
Watcher org.apache.solr.common.cloud.ConnectionManager@187aa366 name:ZooKeeperConnection Watcher:testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
Client is connected to ZooKeeper
Using default ZkACLProvider
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbKdcReq send: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000, number of retries =3, #bytes=681
>>> KDCCommunication: kdc=mykdc.dc.somecompany.com UDP:88, timeout=30000,Attempt =1, #bytes=681
Set cluster prop: 'urlScheme'
>>> KrbKdcReq send: #bytes read=706
>>> KdcAccessibility: remove mykdc.dc.somecompany.com
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 536186913
Krb5Context setting peerSeqNumber to: 0
Created InitSecContextToken:
0000: 01 00 6E 82 02 75 30 82   02 71 A0 03 02 01 05 A1  ..n..u0..q......
0010: 03 02 01 0E A2 07 03 05   00 00 00 00 00 A3 82 01  ................
…Some bunch of similar lines as above…
…Some bunch of similar lines as above…
…Some bunch of similar lines as above…
0260: 5D AC 0E EB 5B 9C ED A9   2E FD 17 EB E0 1D B6 C1  ]...[...........
0270: BF 79 AF C1 EB 19 1E 41   95 8E 8F                 .y.....A...


SASL authentication failed using login context 'Client'.
SASL authentication failed using login context 'Client'.
Watcher org.apache.solr.common.cloud.ConnectionManager@187aa366 name:ZooKeeperConnection Watcher:testnode21.dc.somecompany.com:2181,testnode22.dc.somecompany.com:2181,testnode23.dc.somecompany.com:2181/infra-solr got event WatchedEvent state:AuthFailed type:None path:null path:null type:None
zkClient received AuthFailed
Error updating path [/clusterprops.json]
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /clusterprops.json
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
	at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:311)
	at org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:308)
	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
	at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:308)
	at org.apache.solr.common.cloud.ZkStateReader.setClusterProperty(ZkStateReader.java:738)
	at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:37)
	at org.apache.ambari.logsearch.solr.commands.SetClusterPropertyZkCommand.executeZkCommand(SetClusterPropertyZkCommand.java:26)
	at org.apache.ambari.logsearch.solr.commands.AbstractZookeeperRetryCommand.createAndProcessRequest(AbstractZookeeperRetryCommand.java:38)
	at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.retry(AbstractRetryCommand.java:45)
	at org.apache.ambari.logsearch.solr.commands.AbstractRetryCommand.run(AbstractRetryCommand.java:40)
	at org.apache.ambari.logsearch.solr.AmbariSolrCloudClient.setClusterProp(AmbariSolrCloudClient.java:127)
	at org.apache.ambari.logsearch.solr.AmbariSolrCloudCLI.main(AmbariSolrCloudCLI.java:502)

On the Zookeeper server log, I see the following line -

2017-06-17 23:59:04,508 - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@981] - cnxn.saslServer is null: cnxn object did not initialize its saslServer properly.
2017-06-17 23:59:15,000 - INFO  [SessionTracker:ZooKeeperServer@347] - Expiring session 0x15cb2c4b7070016, timeout of 30000ms exceeded
2017-06-17 23:59:15,000 - INFO  [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x15cb2c4b7070016
2017-06-17 23:59:24,530 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x35cb2c4bac6000f, likely client has closed socket
	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
	at java.lang.Thread.run(Thread.java:745)

1 REPLY 1

avatar
Explorer

This could probably be because when installing with ambari blueprint, zookeeper server_jvmflags does not have the following system property -

SERVER_JVMFLAGS="-Djava.security.auth.login.config=/path/to/zookeeper_jaas.conf"

Going to modify the blueprint and try again.