Support Questions

javabrett · ‎11-07-2013

Manager 4.7.3 on RHEL 6u3.

I'm getting this problem consistently when attempting to add/manage the same host that the Manager Service has just been installed on. I'm referring to the machine via its FQDN (rather than localhost), and I'm using the root account/password directly for SSH access. A Parcels installation fails with:

Installation failed. Failed to execute installation script.
Failed to execute installation script. (Current Step) Last Refreshed: Nov 7, 2013 6:19:30 AM UTC
/tmp/scm_prepare_node.ElXio0Jo
/tmp/scm_prepare_node.ElXio0Jo/scm_prepare_node.sh: could not find hostname or IP address of SCM server
usage: /tmp/scm_prepare_node.ElXio0Jo/scm_prepare_node.sh [options]

When I look at that script, I see that it uses the environment-variable $SSH_CLIENT set by SSH to establish the hostname of the Manager server. When I ssh-in as root, I can see that this variable is set correctly (it points to 127.0.0.1).

To get around this, I have to find the same scm_prepare_node.sh command in the SCM logs and manually-append a --host argument to it. It then completes successfully.

Is it possible that the Manager is detecting that it is installing to the local host, and decides to bypass SSH login, leaving SSH_CLIENT unset and causing Manager-host detection in that script to fail?

Thanks

Brett

javabrett · ‎03-30-2014

For the benefit of others that may encounter this, the root cause of this problem was eventually identified. The problem was caused by the SSH-client-launched remote command running under a much older version of "bash", a version that had the (temporary) problem of not exporting the SSH_CLIENT variable to the environment.

How can this happen and be obscure? It turns out that when the CM executes "ssh 'bash -c ...'", the remote SSH server relies on a static search PATH to locate "bash", which may be different from the path you pick-up with interactive shells.

To check if you have this (unlikely) problem, run this from a machine remote from the target machine:

$ ssh you@yourmachine.com 'which bash'
/usr/local/bin/bash
$ ssh you@yourmachine.com 'bash --version'
GNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)
$ ssh you@yourmachine.com 'env | grep SSH_CLIENT'
SSH_CLIENT=10.1.2.3 56617 22
$ ssh you@yourmachine.com 'bash -c "env | grep SSH_CLIENT"'
(nothing)

Note the really old version of bash reported here for me, and the non-standard path. Then when "bash" is explicitly invoked when checking SSH_CLIENT, it is missing. You can compare this to the results from an interactive shell session. The version of bash above and some other versions around the same time do not correctly export SSH_CLIENT.

The fix for this is eliminate the bad version of bash from the target machine.

Brett

View solution in original post

Clint · ‎11-12-2013

What is the value of the "server_host" property inside this file?

/etc/cloudera-scm-agent/config.ini

Also, what does the 'hostname' command return? I would grep the results of the hostname command from /etc/hosts and make sure your host's name isn't on the loopback line (127.0.0.1) of /etc/hosts.

javabrett · ‎12-02-2013

Thanks for your reply.

There is a whiff of localhost. My config.ini contains:

[General]
# Hostname of Cloudera SCM Server
server_host=localhost

# Port that server is listening on
server_port=7182

I'm not sure how/when the localhost adapter was selected. hostname returns my host's short name:

# hostname
shortname

... and /etc/hosts contains something like:

# cat /etc/hosts
127.0.0.1 localhost.localdomain loghost
10.123.12.34 shortname.company.com shortname

I smell a rat with my networking configuration on this VM, but I can't be sure. It isn't my VM template. loghost instead of localhost alias above looks bad for starters.

I've proceeded with an RPM install of the Manager, followed by parcels, and that looks to be working better, so I'll stick with that.

Thanks again

Brett

Grizzly · ‎12-02-2013

hosts files should look like this on all nodes, where cehd3.test.lab is the name of the cluster node and its IP

[root@cehd3 conf]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.100.101.43 cehd3.test.lab cehd3

once that is in place (regardless of DNS config) confirm with the following

python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"

Grizzly · ‎12-02-2013

yep you got it!

javabrett · ‎12-05-2013

Thanks.

The python test script output something like this:

shortname.company.com

10.123.12.34

I'd be curious to know how the deployer is launching scm_prepare_node.sh, that is the path through running-process to ssh to bash (presumably) to eventually launch the script. It's hard to understand why SSH_CLIENT is not set (unless ssh is not being used to invoke the script remotely, even though everything is on localhost), but that does appear to be the case.

Brett

javabrett · ‎03-30-2014

For the benefit of others that may encounter this, the root cause of this problem was eventually identified. The problem was caused by the SSH-client-launched remote command running under a much older version of "bash", a version that had the (temporary) problem of not exporting the SSH_CLIENT variable to the environment.

How can this happen and be obscure? It turns out that when the CM executes "ssh 'bash -c ...'", the remote SSH server relies on a static search PATH to locate "bash", which may be different from the path you pick-up with interactive shells.

To check if you have this (unlikely) problem, run this from a machine remote from the target machine:

$ ssh you@yourmachine.com 'which bash'
/usr/local/bin/bash
$ ssh you@yourmachine.com 'bash --version'
GNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)
$ ssh you@yourmachine.com 'env | grep SSH_CLIENT'
SSH_CLIENT=10.1.2.3 56617 22
$ ssh you@yourmachine.com 'bash -c "env | grep SSH_CLIENT"'
(nothing)

Note the really old version of bash reported here for me, and the non-standard path. Then when "bash" is explicitly invoked when checking SSH_CLIENT, it is missing. You can compare this to the results from an interactive shell session. The version of bash above and some other versions around the same time do not correctly export SSH_CLIENT.

The fix for this is eliminate the bad version of bash from the target machine.

Brett

Cloudera Community

Support Questions

Problem with scm_prepare_node.sh locating the manager host (via $SSH_CLIENT)