Created on 02-10-2020 11:58 PM - last edited on 02-11-2020 01:34 AM by VidyaSargur
Environment: HDP 2.6 plus Accumulo (Kerberos)
Hello! I've been searching for answers to this same problem on this side project for about a year now:
I have a Accumulo compliant client and when I connect to Accumulo, I get the following error. I would really appreciate some help to finally get past this issue.
2020-02-11 07:35:48.591/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2020-02-11 07:35:48.592/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:java.io.tmpdir=/tmp
2020-02-11 07:35:48.592/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:java.compiler=<NA>
2020-02-11 07:35:48.592/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:os.name=Linux
2020-02-11 07:35:48.592/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:os.arch=amd64
2020-02-11 07:35:48.592/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:os.version=4.19.76-linuxkit
2020-02-11 07:35:48.593/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:user.name=root
2020-02-11 07:35:48.593/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:user.home=/root
2020-02-11 07:35:48.594/UTC INFO [:] [apache.zookeeper.ZooKeeper] Client environment:user.dir=/app
2020-02-11 07:35:48.596/UTC INFO [:] [apache.zookeeper.ZooKeeper] Initiating client connection, connectString=dn0.dev:2181 sessionTimeout=30000 watcher=org.apache.accumulo.fate.zookeeper.ZooSession$ZooWatcher@6295d394
2020-02-11 07:35:48.604/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] zookeeper.disableAutoWatchReset is false
2020-02-11 07:35:48.661/UTC DEBUG [:] [zookeeper.client.ZooKeeperSaslClient] JAAS loginContext is: Client
Debug is true storeKey true useTicketCache true useKeyTab true doNotPrompt true ticketCache is null isInitiator true KeyTab is /etc/security/keytabs/accumulo.headless.keytab refreshKrb5Config is false principal is accumulo-dev tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Java config name: null
Native config name: /etc/krb5.conf
Loaded from native config
Acquire TGT from Cache
>>>KinitOptions cache name is /tmp/krb5cc_0
>>>DEBUG <CCacheInputStream> client principal is accumulo-dev@EXAMPLE.COM
>>>DEBUG <CCacheInputStream> server principal is krbtgt/EXAMPLE.COM@EXAMPLE.COM
>>>DEBUG <CCacheInputStream> key type: 18
>>>DEBUG <CCacheInputStream> auth time: Tue Feb 11 07:08:44 UTC 2020
>>>DEBUG <CCacheInputStream> start time: Tue Feb 11 07:08:44 UTC 2020
>>>DEBUG <CCacheInputStream> end time: Wed Feb 12 07:08:44 UTC 2020
>>>DEBUG <CCacheInputStream> renew_till time: Tue Feb 18 07:08:44 UTC 2020
>>> CCacheInputStream: readFlags() FORWARDABLE; RENEWABLE; INITIAL; PRE_AUTH;
>>>DEBUG <CCacheInputStream> client principal is accumulo-dev@EXAMPLE.COM
>>>DEBUG <CCacheInputStream> server principal is X-CACHECONF:/krb5_ccache_conf_data/fast_avail/krbtgt/EXAMPLE.COM@EXAMPLE.COM@EXAMPLE.COM
>>>DEBUG <CCacheInputStream> key type: 0
>>>DEBUG <CCacheInputStream> auth time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> start time: null
>>>DEBUG <CCacheInputStream> end time: Thu Jan 01 00:00:00 UTC 1970
>>>DEBUG <CCacheInputStream> renew_till time: null
>>> CCacheInputStream: readFlags()
Principal is accumulo-dev@EXAMPLE.COM
2020-02-11 07:35:48.895/UTC WARN [:] [apache.zookeeper.ClientCnxn] SASL configuration failed: javax.security.auth.login.LoginException: No key to store Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-02-11 07:35:48.902/UTC INFO [:] [apache.zookeeper.ClientCnxn] Opening socket connection to server dn0.dev/172.20.0.4:2181
2020-02-11 07:35:49.046/UTC INFO [:] [apache.zookeeper.ClientCnxn] Socket connection established, initiating session, client: /172.20.0.4:44848, server: dn0.dev/172.20.0.4:2181
2020-02-11 07:35:49.051/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Session establishment request sent on dn0.dev/172.20.0.4:2181
2020-02-11 07:35:49.067/UTC TRACE [:] [apache.zookeeper.ClientCnxnSocket] readConnectResult 37 0x[0,0,0,0,0,0,75,30,1,70,32,ffffffaa,ffffffb1,18,0,3b,0,0,0,10,11,ffffff9a,fffffff8,47,3e,ffffff8c,ffffffeb,50,ffffffd0,48,ffffffb5,2a,ffffffc4,ffffff8c,ffffff99,f,0,]
2020-02-11 07:35:49.071/UTC INFO [:] [apache.zookeeper.ClientCnxn] Session establishment complete on server dn0.dev/172.20.0.4:2181, sessionid = 0x17032aab118003b, negotiated timeout = 30000
2020-02-11 07:35:49.166/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 1,3 replyHeader:: 1,356,0 request:: '/accumulo/instances/hdp-accumulo-instance,T response:: s{51,51,1581400237148,1581400237148,0,0,0,0,36,0,51}
2020-02-11 07:35:49.170/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 2,356,0 request:: '/accumulo/instances/hdp-accumulo-instance,T response:: #35326232633161372d666562382d343364332d613730612d613135646463326133323437,s{51,51,1581400237148,1581400237148,0,0,0,0,36,0,51}
2020-02-11 07:35:49.175/UTC TRACE [:] [fate.zookeeper.ZooCache] zookeeper contained /accumulo/instances/hdp-accumulo-instance 52b2c1a7-feb8-43d3-a70a-a15ddc2a3247
2020-02-11 07:35:49.175/UTC TRACE [:] [fate.zookeeper.ZooCache] putting /accumulo/instances/hdp-accumulo-instance 52b2c1a7-feb8-43d3-a70a-a15ddc2a3247 in cache
2020-02-11 07:35:49.177/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 3,3 replyHeader:: 3,356,0 request:: '/accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247,T response:: s{52,52,1581400237155,1581400237155,0,21,0,0,0,19,266}
2020-02-11 07:35:49.179/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 4,4 replyHeader:: 4,356,0 request:: '/accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247,T response:: ,s{52,52,1581400237155,1581400237155,0,21,0,0,0,19,266}
2020-02-11 07:35:49.179/UTC TRACE [:] [fate.zookeeper.ZooCache] zookeeper contained /accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247
2020-02-11 07:35:49.180/UTC TRACE [:] [fate.zookeeper.ZooCache] putting /accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247 in cache
2020-02-11 07:35:49.264/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 5,8 replyHeader:: 5,356,0 request:: '/accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/tservers,T response:: v{'dn0.dev:9997}
2020-02-11 07:35:49.266/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 6,8 replyHeader:: 6,356,0 request:: '/accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/tservers/dn0.dev:9997,T response:: v{'zlock-0000000000}
2020-02-11 07:35:49.269/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 7,3 replyHeader:: 7,356,0 request:: '/accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/tservers/dn0.dev:9997/zlock-0000000000,T response:: s{251,251,1581400311221,1581400311221,0,0,0,103638500126490637,25,0,251}
2020-02-11 07:35:49.273/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Reading reply sessionid:0x17032aab118003b, packet:: clientPath:null serverPath:null finished:false header:: 8,4 replyHeader:: 8,356,0 request:: '/accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/tservers/dn0.dev:9997/zlock-0000000000,T response:: #54534552565f434c49454e543d646e302e6465763a39393937,s{251,251,1581400311221,1581400311221,0,0,0,103638500126490637,25,0,251}
2020-02-11 07:35:49.273/UTC TRACE [:] [fate.zookeeper.ZooCache] zookeeper contained /accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/tservers/dn0.dev:9997/zlock-0000000000 TSERV_CLIENT=dn0.dev:9997
2020-02-11 07:35:49.273/UTC TRACE [:] [fate.zookeeper.ZooCache] putting /accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/tservers/dn0.dev:9997/zlock-0000000000 TSERV_CLIENT=dn0.dev:9997 in cache
2020-02-11 07:35:49.311/UTC TRACE [:] [core.rpc.ThriftUtil] Opening normal transport
2020-02-11 07:35:49.333/UTC TRACE [:] [client.impl.ThriftTransportPool] Creating new connection to connection to dn0.dev:9997
2020-02-11 07:35:59.263/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:36:09.275/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:36:19.281/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 1ms
2020-02-11 07:36:29.263/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:36:39.271/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 0ms
2020-02-11 07:36:49.281/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:36:59.274/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:37:09.285/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:37:19.297/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 1ms
2020-02-11 07:37:29.289/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 1ms
2020-02-11 07:37:39.293/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 1ms
2020-02-11 07:37:49.305/UTC DEBUG [:] [apache.zookeeper.ClientCnxn] Got ping response for sessionid: 0x17032aab118003b after 2ms
2020-02-11 07:37:49.455/UTC DEBUG [:] [client.impl.ServerClient] ClientService request failed dn0.dev:9997, retrying ...
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.20.0.4:34652 remote=dn0.dev/172.20.0.4:9997]
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.accumulo.core.client.impl.thrift.ClientService$Client.recv_authenticate(ClientService.java:500)
at org.apache.accumulo.core.client.impl.thrift.ClientService$Client.authenticate(ClientService.java:486)
at org.apache.accumulo.core.client.impl.ConnectorImpl$1.execute(ConnectorImpl.java:70)
at org.apache.accumulo.core.client.impl.ConnectorImpl$1.execute(ConnectorImpl.java:67)
at org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:98)
at org.apache.accumulo.core.client.impl.ServerClient.execute(ServerClient.java:61)
at org.apache.accumulo.core.client.impl.ConnectorImpl.<init>(ConnectorImpl.java:67)
at org.apache.accumulo.core.client.ZooKeeperInstance.getConnector(ZooKeeperInstance.java:248)
at org.vertexium.accumulo.AccumuloGraphConfiguration.createConnector(AccumuloGraphConfiguration.java:99)
at org.vertexium.accumulo.AccumuloGraph.create(AccumuloGraph.java:171)
at org.vertexium.accumulo.AccumuloGraph.create(AccumuloGraph.java:295)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.vertexium.GraphFactory.createGraph(GraphFactory.java:18)
at org.visallo.tools.AccumuloConnectionCheck.getGraph(AccumuloConnectionCheck.java:104)
at org.visallo.tools.AccumuloConnectionCheck.<init>(AccumuloConnectionCheck.java:59)
at org.visallo.tools.AccumuloConnectionCheck.main(AccumuloConnectionCheck.java:70)
Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.20.0.4:34652 remote=dn0.dev/172.20.0.4:9997]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 27 more
2020-02-11 07:37:49.573/UTC TRACE [:] [client.impl.ThriftTransportPool] Returned connection had error dn0.dev:9997 (120000)
Anyone's help that is familiar with what would be causing this, would be great! In my reading, it looks like @Shelton might be able to know the answer to this.
Created 02-11-2020 04:11 AM
I'm getting this in the GC:
DEBUG: Sleeping for 300000 milliseconds
2020-02-11 12:09:24,132 [server.TThreadPoolServer] ERROR: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1846)
at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48
Created 02-11-2020 04:12 AM
Then this:
2020-02-11 12:12:34,126 [zookeeper.ZooReader] WARN : Saw (possibly) transient exception communicating with ZooKeeper
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /accumulo/52b2c1a7-feb8-43d3-a70a-a15ddc2a3247/monitor/log4j_addr
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
at org.apache.accumulo.fate.zookeeper.ZooReader.getData(ZooReader.java:75)
at org.apache.accumulo.fate.zookeeper.ZooReader.getData(ZooReader.java:67)
at org.apache.accumulo.server.watcher.MonitorLog4jWatcher.updateMonitorLog4jLocation(MonitorLog4jWatcher.java:114)
at org.apache.accumulo.server.watcher.MonitorLog4jWatcher.process(MonitorLog4jWatcher.java:98)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2020-02-11 12:12:34,126 [zookeeper.Retry] DEBUG: Sleeping for 250ms before retrying operation
2020-02-11 12:12:34,377 [zookeeper.ZooSession] DEBUG: Removing closed ZooKeeper session to dn0.dev:2181
2020-02-11 12:12:34,377 [zookeeper.ZooSession] DEBUG: Connecting to dn0.dev:2181 with timeout 30000 with auth
2020-02-11 12:12:34,482 [watcher.MonitorLog4jWatcher] INFO : Changing monitor log4j address to dn0.dev:4560
2020-02-11 12:12:34,482 [watcher.MonitorLog4jWatcher] INFO : Enabled log-forwarding
2020-02-11 12:12:34,538 [tracer.ZooTraceClient] DEBUG: Processing event for trace server zk watch
2020-02-11 12:12:34,578 [zookeeper.ZooLock] DEBUG: event null None Disconnected
2020-02-11 12:12:34,581 [watcher.MonitorLog4jWatcher] INFO : Changing monitor log4j address to dn0.dev:4560
2020-02-11 12:12:34,582 [watcher.MonitorLog4jWatcher] INFO : Enabled log-forwarding
2020-02-11 12:12:34,651 [zookeeper.ZooSession] DEBUG: Session expired, state of current session : Expired
2020-02-11 12:12:34,651 [tracer.ZooTraceClient] DEBUG: Processing event for trace server zk watch
2020-02-11 12:12:34,669 [zookeeper.ZooLock] DEBUG: event null None Expired
2020-02-11 12:12:34,678 [util.Halt] ERROR: FATAL GC lock in zookeeper lost (reason = SESSION_EXPIRED), exiting!
Created 02-11-2020 04:19 AM
The only explanation/possibility I've found is this but there is no solution:
"Ahh, looks like it was the ambari-agent process, part of HDP. Since that runs on the same machine, it wasn't in my tcpdump. Second time ambari-agent has done something unexpected for me! (first was renewing a keytab behind my back)"