Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Accumulo TServer failing after Wizard Install completion

avatar

So I've just completed the Ambari install wizard (Ambari 2.2, HDP 2.3.4) and started the services.Everything is running and working now (not at first) except for 3/5 hosts have the same issue with the Accumulo TServer. It starts up with the solid green check (photo attached) but stops after just a few seconds with the alert icon.

The only information I found about the error is "Connection failed: [Errno 111] Connection refused to mo-31aeb5591.mo.sap.corp:9997" which I showed in the t-server-process. I checked my ssh connection and its fine, and all of the other services installed fine so I'm not sure what exactly that means. I posted the logs below, the .err file just said no such directory, and the .out file is empty. Are there other locations where there is more verbose err logs about this? As said, I am new to the environment.

Any general troubleshooting advice for initial issues after installation or links to guides that may help would also be very appreciated.

[root@xxxxxxxxxxxx ~]# cd /var/log/accumulo/
[root@xxxxxxxxxxxx accumulo]# ls
accumulo-tserver.err  accumulo-tserver.out
[root@xxxxxxxxxxxx accumulo]# cat accumulo-tserver.err
/usr/hdp/current/accumulo-client/bin/accumulo: line 4: /usr/hdp/2.3.4.0-3485/etc/default/accumulo: No such file or directory

6960-dashboard.png

6961-start-success.png

6962-tserver-process-states.png

1 ACCEPTED SOLUTION

avatar

I ended up just removing the TServer service from the nodes that were failing. Not really a solution, but the others ones still work fine. Thanks for your help!

View solution in original post

15 REPLIES 15

avatar

This is the /usr/hdp/current/accumulo-client/bin/accumulo file that gives the error, which is identical on both nodes, the one whose TServer crashes and one who's tserver seems to work fine.

#!/bin/sh

. /usr/hdp/2.3.4.0-3485/etc/default/hadoop
. /usr/hdp/2.3.4.0-3485/etc/default/accumulo

# Autodetect JAVA_HOME if not defined
if [ -e /usr/libexec/bigtop-detect-javahome ]; then
  . /usr/libexec/bigtop-detect-javahome
elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then
  . /usr/lib/bigtop-utils/bigtop-detect-javahome
fi

export HDP_VERSION=${HDP_VERSION:-2.3.4.0-3485}
export ACCUMULO_OTHER_OPTS="-Dhdp.version=${HDP_VERSION} ${ACCUMULO_OTHER_OPTS}"

exec /usr/hdp/2.3.4.0-3485//accumulo/bin/accumulo.distro "$@"

so I could change the line ". /usr/hdp/2.3.4.0-3485/etc/default/accumulo" but if that's whats causing me problems then all of the nodes running it would/should have the same problem.

I will compare their configs a little more to see if there's a difference somewhere.

avatar
Super Guru

@Jonathan Hurley, no... this is expected. From a fresh HDP2.3.4 install I just did:

# ls -l /usr/hdp/2.3.6.0-3796/etc/default/accumulo
-rw-r--r--. 1 root root 975 Jun 23 16:12 /usr/hdp/2.3.6.0-3796/etc/default/accumulo
# yum whatprovides /usr/hdp/2.3.6.0-3796/etc/default/accumulo
accumulo_2_3_6_0_3796-1.7.0.2.3.6.0-3796.el6.x86_64 : Apache Accumulo is a distributed Key Value store based on Google's BigTable
Repo        : HDP-2.3
Matched from:
Filename    : /usr/hdp/2.3.6.0-3796/etc/default/accumulo



accumulo_2_3_6_0_3796-1.7.0.2.3.6.0-3796.el6.x86_64 : Apache Accumulo is a distributed Key Value store based on Google's BigTable
Repo        : installed
Matched from:
Other       : Provides-match: /usr/hdp/2.3.6.0-3796/etc/default/accumulo

avatar

Thank you for both responses @Josh Elser, at least I know its not some obvious mistake I made. I will re-install the service and see what happens.

avatar
Super Collaborator

Ah, OK - I had misread the directory is /usr/hdp/current; indeed etc and other directories are allowed in /usr/hdp/2.3.6.0-3796

Ambari actually doesn't do anything with those directories. We only manipulate the symlinks in /usr/hdp/current and /etc

I'm curious if this is a problem with 2.3.4.0

avatar

I ended up just removing the TServer service from the nodes that were failing. Not really a solution, but the others ones still work fine. Thanks for your help!

avatar
Explorer

I'm getting the same error after installation.

I grepped the /var/log/accumulo/tserver_hostname.log and found a report of:

ERROR: Exception while checking mount points, halting process
java.io.FileNotFoundException: /proc/mounts (Too many files open)

After looking the open files I discovered 136K java open files and 106K jsvc open files, given I set a descriptor limit of 20K I think this might be my problem

$> lsof | awk '{print $1}' | sort | uniq -c | sort -n -k1
...
106000 jsvc
136000 java

I'm digging into this now too. This cluster has no jobs running I'm surprised to see so many open files...