Support Questions

Find answers, ask questions, and share your expertise

Accumulo TServer failing after Wizard Install completion

So I've just completed the Ambari install wizard (Ambari 2.2, HDP 2.3.4) and started the services.Everything is running and working now (not at first) except for 3/5 hosts have the same issue with the Accumulo TServer. It starts up with the solid green check (photo attached) but stops after just a few seconds with the alert icon.

The only information I found about the error is "Connection failed: [Errno 111] Connection refused to" which I showed in the t-server-process. I checked my ssh connection and its fine, and all of the other services installed fine so I'm not sure what exactly that means. I posted the logs below, the .err file just said no such directory, and the .out file is empty. Are there other locations where there is more verbose err logs about this? As said, I am new to the environment.

Any general troubleshooting advice for initial issues after installation or links to guides that may help would also be very appreciated.

[root@xxxxxxxxxxxx ~]# cd /var/log/accumulo/
[root@xxxxxxxxxxxx accumulo]# ls
accumulo-tserver.err  accumulo-tserver.out
[root@xxxxxxxxxxxx accumulo]# cat accumulo-tserver.err
/usr/hdp/current/accumulo-client/bin/accumulo: line 4: /usr/hdp/ No such file or directory





I ended up just removing the TServer service from the nodes that were failing. Not really a solution, but the others ones still work fine. Thanks for your help!

View solution in original post


Super Collaborator

This could be the same issue being discussed here:

Seems like Accumulo doesn't bind to the port after startup.

Hi @Jonathan Hurley, thanks for the response.

So it's a problem with the way Ambari tests whether or not Accumulo TServer is started? That thread indicates a problem with "Ambari" in general, all of my other Ambari services are running. It is only TServer that starts, stops immediately, and says to not be running. If it was running, would there not be logs in those folders?

Do you have any suggestions as to what I can do to confirm that this is the problem? As mentioned, I am brand new to Ambari and HDP in general.

The TabletServer is the exception to this. It is the per-server process, not an HA component.

The content in the err files is definitely the issue here. Might you be able to cross-reference once of these nodes (that are failing) with one that does work? See if that file exists on the other node. If it does, I'd recommend copying it over. I have not run into this problem myself previously.

Hi @Josh Elser, thanks for the response!

I can compare their log files but otherwise I'm not sure how exactly to cross-reference them, as there is little to no log of where the problem is happening. What should I look at for cross-referencing other than the logs? I copied over the log files(from my working nodes) and made sure the appropriate names were used, but the behavior was the same.

EDIT: I feel like I should add that I didn't copy over the "err" files from the working nodes, as there are no errors so this file exists but is empty. I did remove the err line from both instances, this didn't change the behavior, which I didn't think it would.

Sorry, not the logs, but the filesystem layout and the configuration. Specifically, compare if /usr/hdp/ exists on the nodes which are correctly running. My hunch is that you are missing a symlink or something beneath /usr/hdp/ or similar.

And yes, under normal circumstances, the .err file should be present but empty. It is the redirection of STDERR for the Accumulo service which should have nothing while the process is happily running.

@Josh Elser Your suspicion is right: /usr/hdp/ exists on the nodes which are correctly running TServer, and doesn't exist on the ones that aren't.

EDIT: I tried just adding the file to one of the incorrectly-running nodes, and the error changed to "Error: Could not find or load main class org.apache.accumulo.start.Main" ... so @Jonathan Hurley is most likely right about there being an issue with the configuration files, I think? Is there a place either of you recommend I look for this difference in configuration between this node and the others?

Thanks for your help.

If were missing that file, it sounds like the package (RPM/Deb) was never correctly installed. Maybe you're better off trying to remove the service and re-add it if you have no data stored in Accumulo. I'm not sure what else to suggest at this point.

Super Collaborator

That's not a correct file location. Everything under /usr/hdp/ must be HDP component names, like accumulo-tablet or accumulo-client.

I'm not sure how that directory is constructed in the Accumulo files, but something seems to have an invalid value. Perhaps Accumulo isn't looking in the correct location for configuration files? Could it be trying to look in /etc/default/accumulo which doesn't exist?

This is the /usr/hdp/current/accumulo-client/bin/accumulo file that gives the error, which is identical on both nodes, the one whose TServer crashes and one who's tserver seems to work fine.


. /usr/hdp/
. /usr/hdp/

# Autodetect JAVA_HOME if not defined
if [ -e /usr/libexec/bigtop-detect-javahome ]; then
  . /usr/libexec/bigtop-detect-javahome
elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then
  . /usr/lib/bigtop-utils/bigtop-detect-javahome


exec /usr/hdp/ "$@"

so I could change the line ". /usr/hdp/" but if that's whats causing me problems then all of the nodes running it would/should have the same problem.

I will compare their configs a little more to see if there's a difference somewhere.

@Jonathan Hurley, no... this is expected. From a fresh HDP2.3.4 install I just did:

# ls -l /usr/hdp/
-rw-r--r--. 1 root root 975 Jun 23 16:12 /usr/hdp/
# yum whatprovides /usr/hdp/
accumulo_2_3_6_0_3796- : Apache Accumulo is a distributed Key Value store based on Google's BigTable
Repo        : HDP-2.3
Matched from:
Filename    : /usr/hdp/

accumulo_2_3_6_0_3796- : Apache Accumulo is a distributed Key Value store based on Google's BigTable
Repo        : installed
Matched from:
Other       : Provides-match: /usr/hdp/

Thank you for both responses @Josh Elser, at least I know its not some obvious mistake I made. I will re-install the service and see what happens.

Super Collaborator

Ah, OK - I had misread the directory is /usr/hdp/current; indeed etc and other directories are allowed in /usr/hdp/

Ambari actually doesn't do anything with those directories. We only manipulate the symlinks in /usr/hdp/current and /etc

I'm curious if this is a problem with

I ended up just removing the TServer service from the nodes that were failing. Not really a solution, but the others ones still work fine. Thanks for your help!


I'm getting the same error after installation.

I grepped the /var/log/accumulo/tserver_hostname.log and found a report of:

ERROR: Exception while checking mount points, halting process /proc/mounts (Too many files open)

After looking the open files I discovered 136K java open files and 106K jsvc open files, given I set a descriptor limit of 20K I think this might be my problem

$> lsof | awk '{print $1}' | sort | uniq -c | sort -n -k1
106000 jsvc
136000 java

I'm digging into this now too. This cluster has no jobs running I'm surprised to see so many open files...

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.