Support Questions

vaibhavthapliyal · ‎11-19-2014

Dear all,

I am trying to run a basic write query on a multi-node(3 nodes) Accumulo cluster. When I run my code from a node inside the cluster it works fine, but when I use the same code on a machine which is not a node in the cluster the code fails to do anything and exits with this message in the console:

"WARN: Failed to find an available server from a list of servers."

The machine fails to connect to the zookeeper servers.

Has anybody encountered this problem?

Any help would be useful.

Thanks and Regards,
Vaibhav

busbey · ‎11-27-2014

As Mike previously mentioned, those configuration files don't exist when the cluster is handled by CM.

It sounds like the underlying problem might be incorrect host name resolution. Accumulo and Hadoop require forward and reverse DNS to be set up correctly. You should not have IP addresses in your configuration files.

If the problem is incorrect host names, you can check a few things

1) What does CM think the name of the hosts are?

If you go to http://cm.example.com:7180/cmf/hardware/hosts (where "cm.example.com" is the name of your CM host), what is listed in the "name" column? It should be all full qualified domain names.

2) What does the host think its name is?

Log into each of the cluster machines and run the "hostname" command. It should return a fuly qualified domain name and this name should match the one found in the "name" column above.

3) What do Accumulo processes think the host names are?

You can see this by looking inside of ZooKeeper. Because ZooKeeper is used to maintain critical information for Accumulo, you should be very careful while dealing with it directly. It's also important to note that this information is deep in the internals of Accumulo; you must not presume it will be the same across versions. Below I'll show an example from a cluster running Accumulo 1.6.0-cdh5.1.0.

Connect to zookeeper and see what shows up for tablet servers in the /accumulo/%UUID%/tservers node

$ zookeeper-client -server zoo1.example.com,zoo2.example.com,zoo3.example.com
Connecting to zoo1.example.com,zoo2.example.com,zoo3.example.com
... SNIP ...
2014-11-27 10:50:11,499 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=zoo1.example.com,zoo2.example.com,zoo3.example.com sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@8d80be3
Welcome to ZooKeeper!
2014-11-27 10:50:11,535 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@975] - Opening socket connection to server zoo2.example.com/10.17.72.3:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
JLine support is enabled
2014-11-27 10:50:11,546 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@852] - Socket connection established to zoo2.example.com/10.17.72.3:2181, initiating session
2014-11-27 10:50:11,560 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server zoo2.example.com/10.17.72.3:2181, sessionid = 0x349ace5c95e63c4, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/

users               problems            monitor             root_tablet         hdfs_reservations   gc
table_locks         namespaces          recovery            fate                tservers            tables
next_file           tracers             config              masters             bulk_failed_copyq   dead
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/tservers 
[tserver1.example.com:10011,tserver2.example.com:10011,tserver3.example.com:10011,tserver4.example.com:10011,tserver5.example.com:10011]
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 1]

The UUID in the top level /accumulo node is the internal id used to track your Accumulo instance. If there are multiple of these, you can find the one for your current cluster by listing all instance information (presuming you have an Accumulo gateway on the node). This utility is also an Accumulo internal, so neither its name, usage, nor output format can be counted on across versions.

$>  accumulo org.apache.accumulo.server.util.ListInstances
INFO : Using ZooKeepers zoo1.example.com,zoo2.example.com,zoo3.example.com

 Instance Name       | Instance ID                          | Master                        
---------------------+--------------------------------------+-------------------------------
          "accumulo" | e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1 |master2.example.com:10010
         "dedicated" | 496b74ab-316c-41bc-badb-4b908039f725 |                              
         "dedicatum" | e49b451b-4607-4a0e-9dda-b49dc938080e |

4) Is HDFS confused?

Can you use hdfs commands from inside / outside of the cluster? E.g. can you list the root directory? Can the Accumulo user list their home directory or the /accumulo directory?

View solution in original post

vaibhavthapliyal · ‎11-27-2014

Hi,

Thanks everyone for your valuable inputs.

I finally was able to troubleshoot the problem with my cluster. As it turns out the remote machine needed to add the ip-address of the nodes in the cluster in their /etc/hosts/ file.

After doing that I was able to get the desired results.

Thanks everyone!!!

Vaibhav

View solution in original post

mdrob · ‎11-19-2014

Which versions are you running?

Is your host outside of the cluster able to communicate with all of the hosts inside the cluster?

vaibhavthapliyal · ‎11-19-2014

Hi,

I am running 2.5.0-cdh5.2.0 and Accumulo version is Version: 1.6.0-cdh5.1.0 .

Yes my host is able to communicate to the nodes inside the cluster(via ssh).

Thanks for replying

Vaibhav

mdrob · ‎11-20-2014

Can you verify that your client can communicate with all of your ZK servers?

echo ruok | telnet zk1 2181

substituting your actual zookeeper hosts in for "zk1"

vaibhavthapliyal · ‎11-20-2014

Hi,

I tried the command you asked me to run from a machine outside the cluster. This is the response got :

"Trying 192.168.0.125... Connected to 192.168.0.125. Escape character is '^]'. Connection closed by foreign host."

I got the same response on all the nodes.

Does this mean I am connected to the zookeeper server with the outside machine?

Please help.

Thanks

Vaibhav

vaibhavthapliyal · ‎11-21-2014

Hi,

I made some advances towards solving the problem.

What I did was create a standalone accumulo version which again showed the same error. Then I changed the content of the masters and slaves file in /acumulo/conf folder and the error dissappeared. Now I want to the same to my main cluster using the Cloudera Manager but I am having trouble figuring out how to change/view the contents of the master and slaves file manually that would have been created at the time of Accumulo installation using cloudera manager.

Please help.

Thanks

Vaibhav

mdrob · ‎11-21-2014

I'm sorry, I made a mistake with the earlier command. It should have been echo ruok | nc zk1 2181. However, it does look like you are able to connect to zk, so no need to run it again.

I'm not sure I understand what steps you took to replicate the problem. Are you using a Java program to write to Accumulo, or doing this via shell commands?

Normally, the masters/slaves files are used by cluster management scripts that are included with Accumulo. Since CM has it's own tools for starting and stopping processes, it does not rely on those files to track which roles are on which hosts and they do not exist with a managed installation.

vaibhavthapliyal · ‎11-21-2014

Hi,
Thanks for replying.
This is what I did to replicate the problem:

1. First I installed Accumulo 1.6 version in standalone mode using the instructions given at this link:

https://github.com/lumifyio/lumify/blob/master/docs/setup-centos-6.5.md

In this manner I got my instance of Accumulo up and running.

2. Next I ran a java program to do a basic write operation using the Accumulo manual given here:

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_writing_accumulo_clients

I ran this java program from two machines firstly on the machine on which the standalone Accumulo was installed and secondly from a machine with no Accumulo.

The second machine gave me the above mentioned error.

To troubleshoot this error I changed the contents of the masters and slave file from "localhost" to their ip addresses.
This solved that error. So I suspect that changing the contents of the masters and slaves file may solve the same problem with my main cluster.

Is there a way I can do that..?

I also referred to your Accumulo manual given at this link:

https://www.google.co.in/url?sa=t&source=web&rct=j&ei=SGVvVKzqHdCLuATAuIH4BA&url=http://www.cloudera...

At page 13 it says that:

" On a multi-host cluster, replace localhost with the fully qualified domain name (FQDN) or IP
address of the Accumulo Master in the masters, monitor, gc and tracers files in
/etc/accumulo/conf, and add the FQDN or IP address of the TabletServers (one per line) to the
/etc/accumulo/conf/slaves file."

So I want to know us there a way I can achieve this in cloudera manager?

Please help.

Thanks
Vaibhav

vaibhavthapliyal · ‎11-27-2014

Dear all,

I am still struggling with this error. It would be very helpful if someone could help me out.

As said I have described my whole procedure on replicating this error and troubleshooting this on a standalone version without the cloudera manager.

Now I just need help on doing the same using cloudera manager.

Thanks

Vaibhav

busbey · ‎11-27-2014

As Mike previously mentioned, those configuration files don't exist when the cluster is handled by CM.

It sounds like the underlying problem might be incorrect host name resolution. Accumulo and Hadoop require forward and reverse DNS to be set up correctly. You should not have IP addresses in your configuration files.

If the problem is incorrect host names, you can check a few things

1) What does CM think the name of the hosts are?

If you go to http://cm.example.com:7180/cmf/hardware/hosts (where "cm.example.com" is the name of your CM host), what is listed in the "name" column? It should be all full qualified domain names.

2) What does the host think its name is?

Log into each of the cluster machines and run the "hostname" command. It should return a fuly qualified domain name and this name should match the one found in the "name" column above.

3) What do Accumulo processes think the host names are?

You can see this by looking inside of ZooKeeper. Because ZooKeeper is used to maintain critical information for Accumulo, you should be very careful while dealing with it directly. It's also important to note that this information is deep in the internals of Accumulo; you must not presume it will be the same across versions. Below I'll show an example from a cluster running Accumulo 1.6.0-cdh5.1.0.

Connect to zookeeper and see what shows up for tablet servers in the /accumulo/%UUID%/tservers node

$ zookeeper-client -server zoo1.example.com,zoo2.example.com,zoo3.example.com
Connecting to zoo1.example.com,zoo2.example.com,zoo3.example.com
... SNIP ...
2014-11-27 10:50:11,499 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=zoo1.example.com,zoo2.example.com,zoo3.example.com sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@8d80be3
Welcome to ZooKeeper!
2014-11-27 10:50:11,535 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@975] - Opening socket connection to server zoo2.example.com/10.17.72.3:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
JLine support is enabled
2014-11-27 10:50:11,546 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@852] - Socket connection established to zoo2.example.com/10.17.72.3:2181, initiating session
2014-11-27 10:50:11,560 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server zoo2.example.com/10.17.72.3:2181, sessionid = 0x349ace5c95e63c4, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/

users               problems            monitor             root_tablet         hdfs_reservations   gc
table_locks         namespaces          recovery            fate                tservers            tables
next_file           tracers             config              masters             bulk_failed_copyq   dead
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/tservers 
[tserver1.example.com:10011,tserver2.example.com:10011,tserver3.example.com:10011,tserver4.example.com:10011,tserver5.example.com:10011]
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 1]

The UUID in the top level /accumulo node is the internal id used to track your Accumulo instance. If there are multiple of these, you can find the one for your current cluster by listing all instance information (presuming you have an Accumulo gateway on the node). This utility is also an Accumulo internal, so neither its name, usage, nor output format can be counted on across versions.

$>  accumulo org.apache.accumulo.server.util.ListInstances
INFO : Using ZooKeepers zoo1.example.com,zoo2.example.com,zoo3.example.com

 Instance Name       | Instance ID                          | Master                        
---------------------+--------------------------------------+-------------------------------
          "accumulo" | e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1 |master2.example.com:10010
         "dedicated" | 496b74ab-316c-41bc-badb-4b908039f725 |                              
         "dedicatum" | e49b451b-4607-4a0e-9dda-b49dc938080e |

4) Is HDFS confused?

Can you use hdfs commands from inside / outside of the cluster? E.g. can you list the root directory? Can the Accumulo user list their home directory or the /accumulo directory?

Cloudera Community

Support Questions

Problems running a write query on an accumulo cluster