Support Questions

Find answers, ask questions, and share your expertise

CM cannot copy installation files to other servers

avatar
Contributor

Hello everyone.

 

I'm going through "Installation path B" for a production cluster.  MySQL installed and configured fine, Cloudera Manager starts up fine, and I'm going through the familiar steps of choosing hosts in the cluster on which to have CM install the agent software.

 

I get to the "Install Step" in CM, and one of the very first thing it does is "Copying installation files" to the other hosts.  This process never completes, and the file on the target system just sits at 0 bytes forever until the process times out in CM.  

 

My /etc/hosts file seems correct to me:

 

CM Server host:

root@bdprodm09:[57]:~> python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"
bdprodm09.dbhotelcloud.com
172.0.30.1

 

Agent host:

root@bdprodm10:[37]:~> python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"
bdprodm10.dbhotelcloud.com
172.0.30.2

 

I traced the packets with tcpdump between those two IPs, and port 22, and see the traffic, and no errors.  But, in the target /tmp/XXX directory, I just get this:

 

root@bdprodm10:[52]:/tmp/scm_prepare_node.f0oOug56> ls -l
total 0
-rw-r--r-- 1 root root 0 May 2 21:47 scm_prepare_node.sh

 

This particular time I tried it as root, but I've had the same thing happen with another user account (cminstall) that is set up with passwordless su - root as well:

 

root@bdprodm10:[57]:/tmp> ls -l scm_prepare_node.uF3EVkb4
total 0
-rw-r--r-- 1 cminstall cminstall 0 May 2 21:41 scm_prepare_node.sh

 

I tried becoming the 'cminstall' user and manually scp'ing a file from one host to the other, and it worked fine:

 

[cminstall@bdprodm09 ~]$ scp testfile cminstall@bdprodm10.dbhotelcloud.com:/tmp
cminstall@bdprodm10.dbhotelcloud.com's password:
testfile 100% 33 0.0KB/s 00:00

 

---

root@bdprodm10:[58]:/tmp> ls -l testfile
-rw-rw-r-- 1 cminstall cminstall 33 May 2 22:07 testfile

 

The cloudera-scm-server log file doesn't say anything that indicates what is failing:

2016-05-02 22:01:29,243 INFO 164956425@scm-web-36:com.cloudera.server.cmf.node.NodeConfiguratorService: Retrying configurator with id 0
2016-05-02 22:01:29,244 INFO 164956425@scm-web-36:com.cloudera.server.cmf.node.NodeConfiguratorService: Submitted configurator for bdprodm10.dbhotelcloud.com with id 1
2016-05-02 22:01:29,244 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from INIT (PT0.001S) to CONNECT
2016-05-02 22:01:29,245 INFO NodeConfiguratorThread-6-1:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0
2016-05-02 22:01:29,253 INFO NodeConfiguratorThread-6-1:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3
2016-05-02 22:01:29,367 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from CONNECT (PT0.123S) to AUTHENTICATE
2016-05-02 22:01:29,458 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from AUTHENTICATE (PT0.091S) to MAKE_TEMP_DIR
2016-05-02 22:01:29,460 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfigurator: Executing mktemp -d /tmp/scm_prepare_node.XXXXXXXX on bdprodm10.dbhotelcloud.com
2016-05-02 22:01:29,520 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from MAKE_TEMP_DIR (PT0.062S) to COPY_FILES

This is the first time I've had this problem!  I've done this install many times in the past, and I'm out of things to check 😞 

 

Hoping someone can shed some light on this one for me!  

 

Thanks very much for your time.

Chris

1 ACCEPTED SOLUTION

avatar
Master Collaborator

based on issues I've seen they most leads to jumbo framing configuration 😞 could you verify that is working [1]

alternative installation path is to install the packages manually on the node, then vi /etc/cloudera-scm-agent/config.ini and set the Hostname of the CM server - server_host=....

 

eg;

yum clean all
rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera

# below is for the latest CM5 

# modify the baseurl in/etc/yum.repos.d/cloudera-manager.repo and point to the desire CM version

# example for version 5.5.1 baseurl=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.5.1/

wget http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/cloudera-manager.repo -O /etc/yum.repos.d/cloudera-manager.repo
yum install -y oracle-j2sdk* cloudera-manager-{daemons,agent}

 

[1] https://www.mylesgray.com/hardware/test-jumbo-frames-working/

View solution in original post

14 REPLIES 14

avatar
Contributor

Just an update.

 

I tried setting the logging level to TRACE in /etc/cloudera-scm-server/log4j.properties, but that did not give me any more information, unfortunately.

 

The last thing I get in the logs is still:

 

Transitioning from MAKE_TEMP_DIR (PT0.044S) to COPY_FILES

 

Hoping someone might have an idea!

Thanks again,

Chris

 

avatar
Contributor

Another update. 🙂

 

I tried running the installer on just the node that the CM server runs, and things got a *little* further, then died at the "Acquiring installation lock...." step:

 

2016-05-03 14:09:46,814 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Request 2 contains 1 nodes
2016-05-03 14:09:46,814 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Existing scan of node bdprodm09.dbhotelcloud.com is too old, rescanning
2016-05-03 14:09:46,815 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Finished submitting request 2
2016-05-03 14:09:46,815 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Beginning scan of node bdprodm09.dbhotelcloud.com and port 22
2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Canonical hostname is bdprodm09.dbhotelcloud.com
2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Address is 172.0.30.1
2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Connecting to remote host
2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Disconnecting from remote host
2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Connecting to ssh service on remote host
2016-05-03 14:09:46,817 INFO NodeScannerThread-3:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0
2016-05-03 14:09:46,825 INFO NodeScannerThread-3:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3
2016-05-03 14:09:46,847 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Request 2 returning 0/1 scans
2016-05-03 14:09:46,950 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Disconnecting from ssh service on remote host
2016-05-03 14:09:46,950 INFO NodeScannerThread-3:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION
2016-05-03 14:09:46,950 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Connected to SSH on node bdprodm09.dbhotelcloud.com with port 22 (latency PT0S)
2016-05-03 14:09:46,950 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScannerService: Request 2 observed finished scan of node bdprodm09.dbhotelcloud.com
2016-05-03 14:09:47,885 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Request 2 returning 1/1 scans
2016-05-03 14:10:05,991 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeConfiguratorService: Creating request with id 2
2016-05-03 14:10:05,991 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeConfiguratorService: Adding password-based configurator for bdprodm09.dbhotelcloud.com
2016-05-03 14:10:05,992 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeConfiguratorService: Submitted configurator for bdprodm09.dbhotelcloud.com with id 0
2016-05-03 14:10:05,992 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from INIT (PT0.001S) to CONNECT
2016-05-03 14:10:05,992 INFO NodeConfiguratorThread-2-0:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0
2016-05-03 14:10:06,001 INFO NodeConfiguratorThread-2-0:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3
2016-05-03 14:10:06,125 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from CONNECT (PT0.133S) to AUTHENTICATE
2016-05-03 14:10:06,168 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from AUTHENTICATE (PT0.043S) to MAKE_TEMP_DIR
2016-05-03 14:10:06,172 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfigurator: Executing mktemp -d /tmp/scm_prepare_node.XXXXXXXX on bdprodm09.dbhotelcloud.com
2016-05-03 14:10:06,194 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from MAKE_TEMP_DIR (PT0.026S) to COPY_FILES
2016-05-03 14:10:08,108 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from COPY_FILES (PT1.914S) to CHMOD
2016-05-03 14:10:08,110 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfigurator: Executing chmod a+x /tmp/scm_prepare_node.wI9Jc1pA/scm_prepare_node.sh on bdprodm09.dbhotelcloud.com
2016-05-03 14:10:08,130 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from CHMOD (PT0.022S) to EXECUTE_SCRIPT
2016-05-03 14:10:08,170 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfigurator: Executing bash -c '/tmp/scm_prepare_node.wI9Jc1pA/scm_prepare_node.sh --server_version 5.7.0 --server_build 76 --packages /tmp/scm_prepare_node.wI9Jc1pA/packages.scm --always /tmp/scm_prepare_node.wI9Jc1pA/always_install.scm --x86_64 /tmp/scm_prepare_node.wI9Jc1pA/x86_64_packages.scm --skipImpala true --skipSolr true --skipKeyTrustee true --unlimitedJCE false --javaInstallStrategy NONE --agentUserMode ROOT | tee /tmp/scm_prepare_node.wI9Jc1pA/scm_prepare_node.log; exit ${PIPESTATUS[0]}' on bdprodm09.dbhotelcloud.com
2016-05-03 14:10:09,171 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from EXECUTE_SCRIPT (PT1.041S) to SCRIPT_START
2016-05-03 14:10:09,171 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from SCRIPT_START (PT0S) to TAKE_LOCK


2016-05-03 14:11:03,009 ERROR reader:net.schmizz.sshj.transport.TransportImpl: Dying because - {}
java.net.SocketException: Connection timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at net.schmizz.sshj.transport.Reader.run(Reader.java:50)
2016-05-03 14:11:03,012 INFO reader:net.schmizz.sshj.transport.TransportImpl: Disconnected - UNKNOWN
2016-05-03 14:11:03,015 INFO NodeConfiguratorThread-0-0:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION

Super confused here.  It seems to always have something to do with the scp/ssh commands, but I cannot see why things are failing, and failing at different points of the installation.  😞

 

Still looking though....

Chris

 

avatar
Contributor

Another update.

 

I checked the sshd logs to see if there might be something reported there, but unfortunately there was very little, and nothing to indicate an error or problem:

 

May  3 14:24:35 bdprodm10 sshd[102316]: Accepted password for cminstall from 172.0.30.1 port 53100 ssh2
May  3 14:24:35 bdprodm10 sshd[102316]: pam_unix(sshd:session): session opened for user cminstall by (uid=0)

I'm still inclined to believe that there is something ssh related that is causing the issue, just not sure what it would be.

 

Chris

avatar
Contributor

Another update 🙂

 

I increased the debug level of sshd with "-ddd", and I can see things are trying to happen on the remote server, but still nothing really jumps out at me as to what the problem is.  Also, I got an actual exception in the cloudera-scm-server log this time, which I probably wasn't patient enough to wait for in previous attempts:

 

2016-05-03 14:40:17,535 INFO 1048269425@scm-web-25:com.cloudera.server.cmf.node.NodeConfiguratorService: Retrying configurator with id 4
2016-05-03 14:40:17,536 INFO 1048269425@scm-web-25:com.cloudera.server.cmf.node.NodeConfiguratorService: Submitted configurator for bdprodm10.dbhotelcloud.com with id 5
2016-05-03 14:40:17,536 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from INIT (PT0.001S) to CONNECT
2016-05-03 14:40:17,537 INFO NodeConfiguratorThread-3-5:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0
2016-05-03 14:40:17,545 INFO NodeConfiguratorThread-3-5:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3
2016-05-03 14:40:17,668 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from CONNECT (PT0.132S) to AUTHENTICATE
2016-05-03 14:40:17,712 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from AUTHENTICATE (PT0.044S) to MAKE_TEMP_DIR
2016-05-03 14:40:17,723 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfigurator: Executing mktemp -d /tmp/scm_prepare_node.XXXXXXXX on bdprodm10.dbhotelcloud.com
2016-05-03 14:40:17,744 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from MAKE_TEMP_DIR (PT0.032S) to COPY_FILES
2016-05-03 14:55:49,571 ERROR reader:net.schmizz.sshj.transport.TransportImpl: Dying because - {}
java.net.SocketException: Connection timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at net.schmizz.sshj.transport.Reader.run(Reader.java:50)
2016-05-03 14:55:49,571 INFO reader:net.schmizz.sshj.transport.TransportImpl: Disconnected - UNKNOWN
2016-05-03 14:55:49,572 WARN NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfigurator: Could not copy scm_prepare_node.sh to bdprodm10.dbhotelcloud.com
net.schmizz.sshj.common.SSHException: Connection timed out
        at net.schmizz.sshj.common.SSHException$1.chain(SSHException.java:36)
        at net.schmizz.sshj.common.SSHException$1.chain(SSHException.java:29)
        at net.schmizz.sshj.transport.TransportImpl.die(TransportImpl.java:591)
        at net.schmizz.sshj.transport.Reader.run(Reader.java:68)
Caused by: java.net.SocketException: Connection timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at net.schmizz.sshj.transport.Reader.run(Reader.java:50)

In case someone happens to be an sshd guru, here are the details of the debug output I captured as well:

 

debug3: fd 5 is not O_NONBLOCK
debug1: Server will not fork when running in debugging mode.
debug3: send_rexec_state: entering fd = 8 config len 573
debug3: ssh_msg_send: type 0
debug3: send_rexec_state: done
debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8
debug1: inetd sockets after dupping: 3, 3
Connection from 172.0.30.1 port 53107
debug1: Client protocol version 2.0; client software version SSHJ_0_14_0
debug1: no match: SSHJ_0_14_0
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug2: fd 3 setting O_NONBLOCK
debug2: Network child is on pid 103351
debug3: preauth child monitor started
debug3: mm_request_receive entering
debug3: privsep user:group 74:74
debug1: permanently_set_uid: 74/74
debug1: list_hostkey_types: ssh-rsa,ssh-dss
debug1: SSH2_MSG_KEXINIT sent
debug3: Wrote 840 bytes for a total of 861
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ecdsa-sha2-nistp256,ssh-rsa,ssh-dss,ssh-ed25519
debug2: kex_parse_kexinit: aes128-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,cast128-ctr,idea-cbc,idea-ctr,serpent128-cbc,serpent128-ctr,3des-ctr,twofish128-cbc,twofish128-ctr,arcfour,arcfour128
debug2: kex_parse_kexinit: aes128-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,cast128-ctr,idea-cbc,idea-ctr,serpent128-cbc,serpent128-ctr,3des-ctr,twofish128-cbc,twofish128-ctr,arcfour,arcfour128
debug2: kex_parse_kexinit: hmac-sha1,hmac-sha1-96,hmac-md5,hmac-md5-96,hmac-sha2-256,hmac-sha2-512
debug2: kex_parse_kexinit: hmac-sha1,hmac-sha1-96,hmac-md5,hmac-md5-96,hmac-sha2-256,hmac-sha2-512
debug2: kex_parse_kexinit: none
debug2: kex_parse_kexinit: none
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: found hmac-sha1
debug1: kex: client->server aes128-ctr hmac-sha1 none
debug3: mm_request_send entering: type 78
debug3: mm_request_receive_expect entering: type 79
debug3: mm_request_receive entering
debug3: monitor_read: checking request 78
debug3: mm_request_send entering: type 79
debug3: mm_request_receive entering
debug2: mac_setup: found hmac-sha1
debug1: kex: server->client aes128-ctr hmac-sha1 none
debug3: mm_request_send entering: type 78
debug3: mm_request_receive_expect entering: type 79
debug3: monitor_read: checking request 78
debug3: mm_request_receive entering
debug3: mm_request_send entering: type 79
debug3: mm_request_receive entering
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST received
debug3: mm_request_send entering: type 0
debug3: mm_choose_dh: waiting for MONITOR_ANS_MODULI
debug3: mm_request_receive_expect entering: type 1
debug3: mm_request_receive entering
debug3: monitor_read: checking request 0
debug3: mm_answer_moduli: got parameters: 1024 2048 8192
debug3: mm_request_send entering: type 1
debug2: monitor_read: 0 used once, disabling now
debug3: mm_request_receive entering
debug3: mm_choose_dh: remaining 0
debug1: SSH2_MSG_KEX_DH_GEX_GROUP sent
debug3: Wrote 280 bytes for a total of 1141
debug2: dh_gen_key: priv key bits set: 170/320
debug2: bits set: 997/2048
debug1: expecting SSH2_MSG_KEX_DH_GEX_INIT
debug2: bits set: 1035/2048
debug3: mm_key_sign entering
debug3: mm_request_send entering: type 5
debug3: mm_key_sign: waiting for MONITOR_ANS_SIGN
debug3: mm_request_receive_expect entering: type 6
debug3: mm_request_receive entering
debug3: monitor_read: checking request 5
debug3: mm_answer_sign
debug3: mm_answer_sign: signature 0x7f0e958c3090(271)
debug3: mm_request_send entering: type 6
debug2: monitor_read: 5 used once, disabling now
debug3: mm_request_receive entering
debug1: SSH2_MSG_KEX_DH_GEX_REPLY sent
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: Wrote 848 bytes for a total of 1989
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: KEX done
debug3: Wrote 52 bytes for a total of 2041
debug1: userauth-request for user cminstall service ssh-connection method password
debug1: attempt 0 failures 0
debug3: mm_getpwnamallow entering
debug3: mm_request_send entering: type 7
debug3: mm_getpwnamallow: waiting for MONITOR_ANS_PWNAM
debug3: mm_request_receive_expect entering: type 8
debug3: mm_request_receive entering
debug3: monitor_read: checking request 7
debug3: mm_answer_pwnamallow
debug2: parse_server_config: config reprocess config len 573
debug3: mm_answer_pwnamallow: sending MONITOR_ANS_PWNAM: 1
debug3: mm_request_send entering: type 8
debug2: monitor_read: 7 used once, disabling now
debug3: mm_request_receive entering
debug2: input_userauth_request: setting up authctxt for cminstall
debug3: mm_start_pam entering
debug3: mm_request_send entering: type 50
debug3: mm_inform_authserv entering
debug3: monitor_read: checking request 50
debug3: mm_request_send entering: type 3
debug1: PAM: initializing for "cminstall"
debug3: mm_inform_authrole entering
debug3: mm_request_send entering: type 4
debug2: input_userauth_request: try method password
debug3: mm_auth_password entering
debug3: mm_request_send entering: type 11
debug3: mm_auth_password: waiting for MONITOR_ANS_AUTHPASSWORD
debug3: mm_request_receive_expect entering: type 12
debug3: mm_request_receive entering
debug1: PAM: setting PAM_RHOST to "172.0.30.1"
debug1: PAM: setting PAM_TTY to "ssh"
debug2: monitor_read: 50 used once, disabling now
debug3: mm_request_receive entering
debug3: monitor_read: checking request 3
debug3: mm_answer_authserv: service=ssh-connection, style=
debug2: monitor_read: 3 used once, disabling now
debug3: mm_request_receive entering
debug3: monitor_read: checking request 4
debug3: mm_answer_authrole: role=
debug2: monitor_read: 4 used once, disabling now
debug3: mm_request_receive entering
debug3: monitor_read: checking request 11
debug3: PAM: sshpam_passwd_conv called with 1 messages
debug1: PAM: password authentication accepted for cminstall
debug3: mm_answer_authpassword: sending result 1
debug3: mm_request_send entering: type 12
debug3: mm_request_receive_expect entering: type 51
debug3: mm_request_receive entering
debug3: mm_auth_password: user authenticated
debug3: mm_do_pam_account entering
debug3: mm_request_send entering: type 51
debug3: mm_request_receive_expect entering: type 52
debug3: mm_request_receive entering
debug1: do_pam_account: called
debug3: PAM: do_pam_account pam_acct_mgmt = 0 (Success)
debug3: mm_request_send entering: type 52
Accepted password for cminstall from 172.0.30.1 port 53107 ssh2
debug1: monitor_child_preauth: cminstall has been authenticated by privileged process
debug3: mm_do_pam_account returning 1
debug3: mm_get_keystate: Waiting for new keys
debug3: mm_request_receive_expect entering: type 25
debug3: mm_request_receive entering
debug3: Wrote 36 bytes for a total of 2077
debug3: mm_send_keystate: Sending new keys: 0x7f0e958c5130 0x7f0e958c29a0
debug3: mm_newkeys_to_blob: converting 0x7f0e958c5130
debug3: mm_newkeys_to_blob: converting 0x7f0e958c29a0
debug3: mm_send_keystate: New keys have been sent
debug3: mm_send_keystate: Sending compression state
debug3: mm_request_send entering: type 25
debug3: mm_send_keystate: Finished sending state
debug3: mm_newkeys_from_blob: 0x7f0e958d5160(127)
debug3: mm_request_send entering: type 80
debug3: mm_request_receive_expect entering: type 81
debug2: mac_setup: found hmac-sha1
debug3: mm_request_receive entering
debug3: mm_get_keystate: Waiting for second key
debug3: mm_newkeys_from_blob: 0x7f0e958d5160(127)
debug2: mac_setup: found hmac-sha1
debug3: mm_get_keystate: Getting compression state
debug3: mm_get_keystate: Getting Network I/O buffers
debug3: mm_request_receive_expect entering: type 80
debug3: mm_request_receive entering
debug3: mm_request_send entering: type 81
debug3: mm_share_sync: Share sync
debug3: mm_share_sync: Share sync end
debug1: temporarily_use_uid: 1000/1000 (e=0/0)
debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism
debug1: restore_uid: 0/0
debug1: SELinux support disabled
debug1: PAM: establishing credentials
debug3: PAM: opening session
debug1: temporarily_use_uid: 1000/1000 (e=0/0)
debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism
debug1: restore_uid: 0/0
User child is on pid 103352
debug3: mm_request_receive entering
debug1: PAM: establishing credentials
debug1: permanently_set_uid: 1000/1000
debug2: set_newkeys: mode 0
debug2: set_newkeys: mode 1
debug1: Entering interactive session for SSH2.
debug2: fd 6 setting O_NONBLOCK
debug2: fd 7 setting O_NONBLOCK
debug1: server_init_dispatch_20
debug1: server_input_channel_open: ctype session rchan 0 win 2097152 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug2: session_new: allocate (allocated 0 max 10)
debug3: session_unused: session id 0 unused
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug3: Wrote 52 bytes for a total of 2129
debug1: server_input_channel_req: channel 0 request pty-req reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req pty-req
debug1: Allocating pty.
debug3: mm_request_send entering: type 26
debug3: mm_pty_allocate: waiting for MONITOR_ANS_PTY
debug3: mm_request_receive_expect entering: type 27
debug3: monitor_read: checking request 26
debug3: mm_request_receive entering
debug3: mm_answer_pty entering
debug2: session_new: allocate (allocated 0 max 10)
debug3: session_unused: session id 0 unused
debug1: session_new: session 0
debug3: mm_request_send entering: type 27
debug3: mm_answer_pty: tty /dev/pts/2 ptyfd 5
debug1: session_pty_req: session 0 alloc /dev/pts/2
debug3: mm_request_receive entering
debug3: Wrote 36 bytes for a total of 2165
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug2: fd 3 setting TCP_NODELAY
debug2: channel 0: rfd 10 isatty
debug2: fd 10 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
debug3: Wrote 88 bytes for a total of 2253
debug1: Setting controlling tty using TIOCSCTTY.
debug3: Wrote 372 bytes for a total of 2625
debug1: Received SIGCHLD.
debug1: session_by_pid: pid 103353
debug1: session_exit_message: session 0 channel 0 pid 103353
debug2: channel 0: request exit-status confirm 0
debug1: session_exit_message: release channel 0
debug2: channel 0: write failed
debug2: channel 0: close_write
debug2: channel 0: send eow
debug2: channel 0: output open -> closed
debug3: mm_request_send entering: type 28
debug3: monitor_read: checking request 28
debug3: mm_answer_pty_cleanup entering
debug2: notify_done: reading
debug1: session_by_tty: session 0 tty /dev/pts/2
debug2: channel 0: read<=0 rfd 10 len -1
debug3: mm_session_close: session 0 pid 103352
debug2: channel 0: read failed
debug3: mm_session_close: tty /dev/pts/2 ptyfd 5
debug2: channel 0: close_read
debug1: session_pty_cleanup: session 0 release /dev/pts/2
debug2: channel 0: input open -> drain
debug3: Wrote 152 bytes for a total of 2777
debug2: channel 0: ibuf empty
debug2: channel 0: send eof
debug2: channel 0: input drain -> closed
debug2: channel 0: send close
debug3: Wrote 72 bytes for a total of 2849
debug3: channel 0: will not send data after close
debug2: channel 0: rcvd close
debug3: channel 0: will not send data after close
debug2: channel 0: is dead
debug2: channel 0: gc: notify user
debug1: session_by_channel: session 0 channel 0
debug1: session_close_by_channel: channel 0 child 0
debug1: session_close: session 0 pid 0
debug3: session_unused: session id 0 unused
debug3: session_unused: session id 0 unused
debug2: channel 0: gc: user detached
debug2: channel 0: is dead
debug3: mm_request_receive entering
debug2: channel 0: garbage collecting
debug1: channel 0: free: server-session, nchannels 1
debug3: channel 0: status: The following connections are open:
  #0 server-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)

debug3: channel 0: close_fds r -1 w -1 e -1
debug1: server_input_channel_open: ctype session rchan 1 win 2097152 max 32768
debug1: input_session_request
debug1: channel 0: new [server-session]
debug1: session_new: session 0
debug1: session_open: channel 0
debug1: session_open: session 0: link with channel 0
debug1: server_input_channel_open: confirm session
debug3: Wrote 52 bytes for a total of 2901
debug1: server_input_channel_req: channel 0 request exec reply 1
debug1: session_by_channel: session 0 channel 0
debug1: session_input_channel_req: session 0 req exec
debug3: mm_audit_run_command entering command scp -t -r '/tmp/scm_prepare_node.lHmTD4I5'
debug3: mm_request_send entering: type 62
debug3: mm_request_receive_expect entering: type 63
debug3: mm_request_receive entering
debug3: monitor_read: checking request 62
debug3: mm_answer_audit_command entering
debug1: session_new: session 0
debug3: mm_request_send entering: type 63
debug3: mm_request_receive entering
debug2: fd 10 setting O_NONBLOCK
debug2: fd 9 setting O_NONBLOCK
debug2: fd 12 setting O_NONBLOCK
debug3: Wrote 88 bytes for a total of 2989
debug2: channel 0: read 280 from efd 12
debug2: channel 0: rwin 2097152 elen 280 euse 1
debug2: channel 0: sent ext data 280
debug3: Wrote 324 bytes for a total of 3313
debug3: Wrote 52 bytes for a total of 3365
debug3: Wrote 52 bytes for a total of 3417

You can see from that debug trace that ssh does see an attempt made to copy the install file:

 

debug3: mm_audit_run_command entering command scp -t -r '/tmp/scm_prepare_node.lHmTD4I5'

 

And yet nothing ever gets written, despite the fact that sshd reports various numbers of bytes written:

 

root@bdprodm10:[86]:/tmp/scm_prepare_node.lHmTD4I5> ls -l
total 0
-rw-r--r-- 1 cminstall cminstall 0 May  3 14:40 scm_prepare_node.sh

Still have a hunch that it's an ssh configuration issue, but still looking as to what that might be.

 

Thanks for reading all of this 🙂

Chris

 

avatar
Contributor

Another update.

 

I've compared both the ssh_config file and sshd_config file from other servers with this one, and everything is the same.  I'm batting 0 here.

 

I'm wondering if there is some compatibility issue between the version of the Java SSH libs used by CM and the version of OpenSSH on my CentOS 6.7 servers?

 

2016-05-03 20:02:21,468 INFO NodeScannerThread-6:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0
2016-05-03 20:02:21,477 INFO NodeScannerThread-6:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3

I'm at a loss, completely.

 

avatar
Contributor

Kinda at the "grasping at straws" point now.

 

I had read on the https://github.com/hierynomus/sshj bug list something about using the unlimited JCE security jars, and a warning about "Illegal key size", which I did see in the CM logs, but dismissed as unimportant. 

 

Just for fun, I installed the unlimited JCE jars from http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html on both the CM server and the agent server, but then CM wouldn't even start.  It complained about "

Caused by: java.lang.NoClassDefFoundError: Could not initialize class javax.crypto.JceSecurity"

 

Sooo, guess that's not the problem either.

 

Chris

avatar
Master Collaborator

how is your /tmp mounted? just noticed that "scm_prepare_node.sh" is 644 where it should be 755

avatar
Contributor

Thanks for the reply Michalis!

 

/tmp is 777:

 

root@bdprodm10:[134]:/tmp> ls -la
total 203756
drwxrwxrwt. 14 root      root           4096 May  3 20:41 .

I noticed on a working CM install that the next step in the install process is a CHMOD step that I never reach, because the COPY step never completes.  I think that is why the permissions are 644 on the .sh file, and 700 on the directory itself:

 

root@bdprodm10:[136]:/tmp> ls -l | grep scm_prepare_node.mWfGt4sr
drwx------  2 cminstall cminstall      4096 May  3 20:22 scm_prepare_node.mWfGt4sr
root@bdprodm10:[137]:/tmp> ls -l scm_prepare_node.mWfGt4sr
total 0
-rw-r--r-- 1 cminstall cminstall 0 May  3 20:22 scm_prepare_node.sh

Again, thank you for your help!

avatar
Master Collaborator

What's your network MTU size between your network?