Support Questions

Find answers, ask questions, and share your expertise

Accumulo keeps crashing with error

avatar
Expert Contributor

The thrift server stops responding and Accumulo crashes. The log shows a lot of these error messages but doesn't really point to what the issue is. Anyone familiar with this?

ERROR: Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51) at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException

1 ACCEPTED SOLUTION

avatar
Super Guru

That exception isn't a direct cause of the server failing -- it's just saying that an RPC failed (it should be suppressed and not logged out). By "thrift server" do you mean TabletServer?

If so, also check the .out/.err files for the process. There may have been some out of memory issue didn't get printed to the log4j file.

View solution in original post

10 REPLIES 10

avatar
Super Guru

Yes, seeing "[Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]" errors in a sevice after it has been running for some time implies that the issue is ACCUMULO-4069