Created on 05-02-2016 03:13 PM - edited 09-16-2022 03:16 AM
Hello everyone.
I'm going through "Installation path B" for a production cluster. MySQL installed and configured fine, Cloudera Manager starts up fine, and I'm going through the familiar steps of choosing hosts in the cluster on which to have CM install the agent software.
I get to the "Install Step" in CM, and one of the very first thing it does is "Copying installation files" to the other hosts. This process never completes, and the file on the target system just sits at 0 bytes forever until the process times out in CM.
My /etc/hosts file seems correct to me:
CM Server host:
root@bdprodm09:[57]:~> python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"
bdprodm09.dbhotelcloud.com
172.0.30.1
Agent host:
root@bdprodm10:[37]:~> python -c "import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())"
bdprodm10.dbhotelcloud.com
172.0.30.2
I traced the packets with tcpdump between those two IPs, and port 22, and see the traffic, and no errors. But, in the target /tmp/XXX directory, I just get this:
root@bdprodm10:[52]:/tmp/scm_prepare_node.f0oOug56> ls -l
total 0
-rw-r--r-- 1 root root 0 May 2 21:47 scm_prepare_node.sh
This particular time I tried it as root, but I've had the same thing happen with another user account (cminstall) that is set up with passwordless su - root as well:
root@bdprodm10:[57]:/tmp> ls -l scm_prepare_node.uF3EVkb4
total 0
-rw-r--r-- 1 cminstall cminstall 0 May 2 21:41 scm_prepare_node.sh
I tried becoming the 'cminstall' user and manually scp'ing a file from one host to the other, and it worked fine:
[cminstall@bdprodm09 ~]$ scp testfile cminstall@bdprodm10.dbhotelcloud.com:/tmp
cminstall@bdprodm10.dbhotelcloud.com's password:
testfile 100% 33 0.0KB/s 00:00
---
root@bdprodm10:[58]:/tmp> ls -l testfile
-rw-rw-r-- 1 cminstall cminstall 33 May 2 22:07 testfile
The cloudera-scm-server log file doesn't say anything that indicates what is failing:
2016-05-02 22:01:29,243 INFO 164956425@scm-web-36:com.cloudera.server.cmf.node.NodeConfiguratorService: Retrying configurator with id 0 2016-05-02 22:01:29,244 INFO 164956425@scm-web-36:com.cloudera.server.cmf.node.NodeConfiguratorService: Submitted configurator for bdprodm10.dbhotelcloud.com with id 1 2016-05-02 22:01:29,244 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from INIT (PT0.001S) to CONNECT 2016-05-02 22:01:29,245 INFO NodeConfiguratorThread-6-1:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0 2016-05-02 22:01:29,253 INFO NodeConfiguratorThread-6-1:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3 2016-05-02 22:01:29,367 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from CONNECT (PT0.123S) to AUTHENTICATE 2016-05-02 22:01:29,458 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from AUTHENTICATE (PT0.091S) to MAKE_TEMP_DIR 2016-05-02 22:01:29,460 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfigurator: Executing mktemp -d /tmp/scm_prepare_node.XXXXXXXX on bdprodm10.dbhotelcloud.com 2016-05-02 22:01:29,520 INFO NodeConfiguratorThread-6-1:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from MAKE_TEMP_DIR (PT0.062S) to COPY_FILES
This is the first time I've had this problem! I've done this install many times in the past, and I'm out of things to check 😞
Hoping someone can shed some light on this one for me!
Thanks very much for your time.
Chris
Created on 05-03-2016 02:49 PM - edited 05-03-2016 02:55 PM
based on issues I've seen they most leads to jumbo framing configuration 😞 could you verify that is working [1]
alternative installation path is to install the packages manually on the node, then vi /etc/cloudera-scm-agent/config.ini and set the Hostname of the CM server - server_host=....
eg;
yum clean all
rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
# below is for the latest CM5
# modify the baseurl in/etc/yum.repos.d/cloudera-manager.repo and point to the desire CM version
# example for version 5.5.1 baseurl=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.5.1/
wget http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/cloudera-manager.repo -O /etc/yum.repos.d/cloudera-manager.repo
yum install -y oracle-j2sdk* cloudera-manager-{daemons,agent}
[1] https://www.mylesgray.com/hardware/test-jumbo-frames-working/
Created 05-03-2016 07:00 AM
Just an update.
I tried setting the logging level to TRACE in /etc/cloudera-scm-server/log4j.properties, but that did not give me any more information, unfortunately.
The last thing I get in the logs is still:
Transitioning from MAKE_TEMP_DIR (PT0.044S) to COPY_FILES
Hoping someone might have an idea!
Thanks again,
Chris
Created 05-03-2016 07:14 AM
Another update. 🙂
I tried running the installer on just the node that the CM server runs, and things got a *little* further, then died at the "Acquiring installation lock...." step:
2016-05-03 14:09:46,814 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Request 2 contains 1 nodes 2016-05-03 14:09:46,814 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Existing scan of node bdprodm09.dbhotelcloud.com is too old, rescanning 2016-05-03 14:09:46,815 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Finished submitting request 2 2016-05-03 14:09:46,815 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Beginning scan of node bdprodm09.dbhotelcloud.com and port 22 2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Canonical hostname is bdprodm09.dbhotelcloud.com 2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Address is 172.0.30.1 2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Connecting to remote host 2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Disconnecting from remote host 2016-05-03 14:09:46,816 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Connecting to ssh service on remote host 2016-05-03 14:09:46,817 INFO NodeScannerThread-3:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0 2016-05-03 14:09:46,825 INFO NodeScannerThread-3:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3 2016-05-03 14:09:46,847 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Request 2 returning 0/1 scans 2016-05-03 14:09:46,950 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Disconnecting from ssh service on remote host 2016-05-03 14:09:46,950 INFO NodeScannerThread-3:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION 2016-05-03 14:09:46,950 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScanner: Connected to SSH on node bdprodm09.dbhotelcloud.com with port 22 (latency PT0S) 2016-05-03 14:09:46,950 INFO NodeScannerThread-3:com.cloudera.server.cmf.node.NodeScannerService: Request 2 observed finished scan of node bdprodm09.dbhotelcloud.com 2016-05-03 14:09:47,885 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeScannerService: Request 2 returning 1/1 scans 2016-05-03 14:10:05,991 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeConfiguratorService: Creating request with id 2 2016-05-03 14:10:05,991 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeConfiguratorService: Adding password-based configurator for bdprodm09.dbhotelcloud.com 2016-05-03 14:10:05,992 INFO 1458067857@scm-web-14:com.cloudera.server.cmf.node.NodeConfiguratorService: Submitted configurator for bdprodm09.dbhotelcloud.com with id 0 2016-05-03 14:10:05,992 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from INIT (PT0.001S) to CONNECT 2016-05-03 14:10:05,992 INFO NodeConfiguratorThread-2-0:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0 2016-05-03 14:10:06,001 INFO NodeConfiguratorThread-2-0:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3 2016-05-03 14:10:06,125 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from CONNECT (PT0.133S) to AUTHENTICATE 2016-05-03 14:10:06,168 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from AUTHENTICATE (PT0.043S) to MAKE_TEMP_DIR 2016-05-03 14:10:06,172 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfigurator: Executing mktemp -d /tmp/scm_prepare_node.XXXXXXXX on bdprodm09.dbhotelcloud.com 2016-05-03 14:10:06,194 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from MAKE_TEMP_DIR (PT0.026S) to COPY_FILES 2016-05-03 14:10:08,108 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from COPY_FILES (PT1.914S) to CHMOD 2016-05-03 14:10:08,110 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfigurator: Executing chmod a+x /tmp/scm_prepare_node.wI9Jc1pA/scm_prepare_node.sh on bdprodm09.dbhotelcloud.com 2016-05-03 14:10:08,130 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from CHMOD (PT0.022S) to EXECUTE_SCRIPT 2016-05-03 14:10:08,170 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfigurator: Executing bash -c '/tmp/scm_prepare_node.wI9Jc1pA/scm_prepare_node.sh --server_version 5.7.0 --server_build 76 --packages /tmp/scm_prepare_node.wI9Jc1pA/packages.scm --always /tmp/scm_prepare_node.wI9Jc1pA/always_install.scm --x86_64 /tmp/scm_prepare_node.wI9Jc1pA/x86_64_packages.scm --skipImpala true --skipSolr true --skipKeyTrustee true --unlimitedJCE false --javaInstallStrategy NONE --agentUserMode ROOT | tee /tmp/scm_prepare_node.wI9Jc1pA/scm_prepare_node.log; exit ${PIPESTATUS[0]}' on bdprodm09.dbhotelcloud.com 2016-05-03 14:10:09,171 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from EXECUTE_SCRIPT (PT1.041S) to SCRIPT_START 2016-05-03 14:10:09,171 INFO NodeConfiguratorThread-2-0:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm09.dbhotelcloud.com: Transitioning from SCRIPT_START (PT0S) to TAKE_LOCK 2016-05-03 14:11:03,009 ERROR reader:net.schmizz.sshj.transport.TransportImpl: Dying because - {} java.net.SocketException: Connection timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at net.schmizz.sshj.transport.Reader.run(Reader.java:50) 2016-05-03 14:11:03,012 INFO reader:net.schmizz.sshj.transport.TransportImpl: Disconnected - UNKNOWN 2016-05-03 14:11:03,015 INFO NodeConfiguratorThread-0-0:net.schmizz.sshj.transport.TransportImpl: Disconnected - BY_APPLICATION
Super confused here. It seems to always have something to do with the scp/ssh commands, but I cannot see why things are failing, and failing at different points of the installation. 😞
Still looking though....
Chris
Created 05-03-2016 07:29 AM
Another update.
I checked the sshd logs to see if there might be something reported there, but unfortunately there was very little, and nothing to indicate an error or problem:
May 3 14:24:35 bdprodm10 sshd[102316]: Accepted password for cminstall from 172.0.30.1 port 53100 ssh2 May 3 14:24:35 bdprodm10 sshd[102316]: pam_unix(sshd:session): session opened for user cminstall by (uid=0)
I'm still inclined to believe that there is something ssh related that is causing the issue, just not sure what it would be.
Chris
Created 05-03-2016 08:43 AM
Another update 🙂
I increased the debug level of sshd with "-ddd", and I can see things are trying to happen on the remote server, but still nothing really jumps out at me as to what the problem is. Also, I got an actual exception in the cloudera-scm-server log this time, which I probably wasn't patient enough to wait for in previous attempts:
2016-05-03 14:40:17,535 INFO 1048269425@scm-web-25:com.cloudera.server.cmf.node.NodeConfiguratorService: Retrying configurator with id 4 2016-05-03 14:40:17,536 INFO 1048269425@scm-web-25:com.cloudera.server.cmf.node.NodeConfiguratorService: Submitted configurator for bdprodm10.dbhotelcloud.com with id 5 2016-05-03 14:40:17,536 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from INIT (PT0.001S) to CONNECT 2016-05-03 14:40:17,537 INFO NodeConfiguratorThread-3-5:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0 2016-05-03 14:40:17,545 INFO NodeConfiguratorThread-3-5:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3 2016-05-03 14:40:17,668 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from CONNECT (PT0.132S) to AUTHENTICATE 2016-05-03 14:40:17,712 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from AUTHENTICATE (PT0.044S) to MAKE_TEMP_DIR 2016-05-03 14:40:17,723 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfigurator: Executing mktemp -d /tmp/scm_prepare_node.XXXXXXXX on bdprodm10.dbhotelcloud.com 2016-05-03 14:40:17,744 INFO NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfiguratorProgress: bdprodm10.dbhotelcloud.com: Transitioning from MAKE_TEMP_DIR (PT0.032S) to COPY_FILES 2016-05-03 14:55:49,571 ERROR reader:net.schmizz.sshj.transport.TransportImpl: Dying because - {} java.net.SocketException: Connection timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at net.schmizz.sshj.transport.Reader.run(Reader.java:50) 2016-05-03 14:55:49,571 INFO reader:net.schmizz.sshj.transport.TransportImpl: Disconnected - UNKNOWN 2016-05-03 14:55:49,572 WARN NodeConfiguratorThread-3-5:com.cloudera.server.cmf.node.NodeConfigurator: Could not copy scm_prepare_node.sh to bdprodm10.dbhotelcloud.com net.schmizz.sshj.common.SSHException: Connection timed out at net.schmizz.sshj.common.SSHException$1.chain(SSHException.java:36) at net.schmizz.sshj.common.SSHException$1.chain(SSHException.java:29) at net.schmizz.sshj.transport.TransportImpl.die(TransportImpl.java:591) at net.schmizz.sshj.transport.Reader.run(Reader.java:68) Caused by: java.net.SocketException: Connection timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at net.schmizz.sshj.transport.Reader.run(Reader.java:50)
In case someone happens to be an sshd guru, here are the details of the debug output I captured as well:
debug3: fd 5 is not O_NONBLOCK debug1: Server will not fork when running in debugging mode. debug3: send_rexec_state: entering fd = 8 config len 573 debug3: ssh_msg_send: type 0 debug3: send_rexec_state: done debug1: rexec start in 5 out 5 newsock 5 pipe -1 sock 8 debug1: inetd sockets after dupping: 3, 3 Connection from 172.0.30.1 port 53107 debug1: Client protocol version 2.0; client software version SSHJ_0_14_0 debug1: no match: SSHJ_0_14_0 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_5.3 debug2: fd 3 setting O_NONBLOCK debug2: Network child is on pid 103351 debug3: preauth child monitor started debug3: mm_request_receive entering debug3: privsep user:group 74:74 debug1: permanently_set_uid: 74/74 debug1: list_hostkey_types: ssh-rsa,ssh-dss debug1: SSH2_MSG_KEXINIT sent debug3: Wrote 840 bytes for a total of 861 debug1: SSH2_MSG_KEXINIT received debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,zlib@openssh.com debug2: kex_parse_kexinit: none,zlib@openssh.com debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ecdsa-sha2-nistp256,ssh-rsa,ssh-dss,ssh-ed25519 debug2: kex_parse_kexinit: aes128-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,cast128-ctr,idea-cbc,idea-ctr,serpent128-cbc,serpent128-ctr,3des-ctr,twofish128-cbc,twofish128-ctr,arcfour,arcfour128 debug2: kex_parse_kexinit: aes128-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,cast128-ctr,idea-cbc,idea-ctr,serpent128-cbc,serpent128-ctr,3des-ctr,twofish128-cbc,twofish128-ctr,arcfour,arcfour128 debug2: kex_parse_kexinit: hmac-sha1,hmac-sha1-96,hmac-md5,hmac-md5-96,hmac-sha2-256,hmac-sha2-512 debug2: kex_parse_kexinit: hmac-sha1,hmac-sha1-96,hmac-md5,hmac-md5-96,hmac-sha2-256,hmac-sha2-512 debug2: kex_parse_kexinit: none debug2: kex_parse_kexinit: none debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: mac_setup: found hmac-sha1 debug1: kex: client->server aes128-ctr hmac-sha1 none debug3: mm_request_send entering: type 78 debug3: mm_request_receive_expect entering: type 79 debug3: mm_request_receive entering debug3: monitor_read: checking request 78 debug3: mm_request_send entering: type 79 debug3: mm_request_receive entering debug2: mac_setup: found hmac-sha1 debug1: kex: server->client aes128-ctr hmac-sha1 none debug3: mm_request_send entering: type 78 debug3: mm_request_receive_expect entering: type 79 debug3: monitor_read: checking request 78 debug3: mm_request_receive entering debug3: mm_request_send entering: type 79 debug3: mm_request_receive entering debug1: SSH2_MSG_KEX_DH_GEX_REQUEST received debug3: mm_request_send entering: type 0 debug3: mm_choose_dh: waiting for MONITOR_ANS_MODULI debug3: mm_request_receive_expect entering: type 1 debug3: mm_request_receive entering debug3: monitor_read: checking request 0 debug3: mm_answer_moduli: got parameters: 1024 2048 8192 debug3: mm_request_send entering: type 1 debug2: monitor_read: 0 used once, disabling now debug3: mm_request_receive entering debug3: mm_choose_dh: remaining 0 debug1: SSH2_MSG_KEX_DH_GEX_GROUP sent debug3: Wrote 280 bytes for a total of 1141 debug2: dh_gen_key: priv key bits set: 170/320 debug2: bits set: 997/2048 debug1: expecting SSH2_MSG_KEX_DH_GEX_INIT debug2: bits set: 1035/2048 debug3: mm_key_sign entering debug3: mm_request_send entering: type 5 debug3: mm_key_sign: waiting for MONITOR_ANS_SIGN debug3: mm_request_receive_expect entering: type 6 debug3: mm_request_receive entering debug3: monitor_read: checking request 5 debug3: mm_answer_sign debug3: mm_answer_sign: signature 0x7f0e958c3090(271) debug3: mm_request_send entering: type 6 debug2: monitor_read: 5 used once, disabling now debug3: mm_request_receive entering debug1: SSH2_MSG_KEX_DH_GEX_REPLY sent debug2: kex_derive_keys debug2: set_newkeys: mode 1 debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug3: Wrote 848 bytes for a total of 1989 debug2: set_newkeys: mode 0 debug1: SSH2_MSG_NEWKEYS received debug1: KEX done debug3: Wrote 52 bytes for a total of 2041 debug1: userauth-request for user cminstall service ssh-connection method password debug1: attempt 0 failures 0 debug3: mm_getpwnamallow entering debug3: mm_request_send entering: type 7 debug3: mm_getpwnamallow: waiting for MONITOR_ANS_PWNAM debug3: mm_request_receive_expect entering: type 8 debug3: mm_request_receive entering debug3: monitor_read: checking request 7 debug3: mm_answer_pwnamallow debug2: parse_server_config: config reprocess config len 573 debug3: mm_answer_pwnamallow: sending MONITOR_ANS_PWNAM: 1 debug3: mm_request_send entering: type 8 debug2: monitor_read: 7 used once, disabling now debug3: mm_request_receive entering debug2: input_userauth_request: setting up authctxt for cminstall debug3: mm_start_pam entering debug3: mm_request_send entering: type 50 debug3: mm_inform_authserv entering debug3: monitor_read: checking request 50 debug3: mm_request_send entering: type 3 debug1: PAM: initializing for "cminstall" debug3: mm_inform_authrole entering debug3: mm_request_send entering: type 4 debug2: input_userauth_request: try method password debug3: mm_auth_password entering debug3: mm_request_send entering: type 11 debug3: mm_auth_password: waiting for MONITOR_ANS_AUTHPASSWORD debug3: mm_request_receive_expect entering: type 12 debug3: mm_request_receive entering debug1: PAM: setting PAM_RHOST to "172.0.30.1" debug1: PAM: setting PAM_TTY to "ssh" debug2: monitor_read: 50 used once, disabling now debug3: mm_request_receive entering debug3: monitor_read: checking request 3 debug3: mm_answer_authserv: service=ssh-connection, style= debug2: monitor_read: 3 used once, disabling now debug3: mm_request_receive entering debug3: monitor_read: checking request 4 debug3: mm_answer_authrole: role= debug2: monitor_read: 4 used once, disabling now debug3: mm_request_receive entering debug3: monitor_read: checking request 11 debug3: PAM: sshpam_passwd_conv called with 1 messages debug1: PAM: password authentication accepted for cminstall debug3: mm_answer_authpassword: sending result 1 debug3: mm_request_send entering: type 12 debug3: mm_request_receive_expect entering: type 51 debug3: mm_request_receive entering debug3: mm_auth_password: user authenticated debug3: mm_do_pam_account entering debug3: mm_request_send entering: type 51 debug3: mm_request_receive_expect entering: type 52 debug3: mm_request_receive entering debug1: do_pam_account: called debug3: PAM: do_pam_account pam_acct_mgmt = 0 (Success) debug3: mm_request_send entering: type 52 Accepted password for cminstall from 172.0.30.1 port 53107 ssh2 debug1: monitor_child_preauth: cminstall has been authenticated by privileged process debug3: mm_do_pam_account returning 1 debug3: mm_get_keystate: Waiting for new keys debug3: mm_request_receive_expect entering: type 25 debug3: mm_request_receive entering debug3: Wrote 36 bytes for a total of 2077 debug3: mm_send_keystate: Sending new keys: 0x7f0e958c5130 0x7f0e958c29a0 debug3: mm_newkeys_to_blob: converting 0x7f0e958c5130 debug3: mm_newkeys_to_blob: converting 0x7f0e958c29a0 debug3: mm_send_keystate: New keys have been sent debug3: mm_send_keystate: Sending compression state debug3: mm_request_send entering: type 25 debug3: mm_send_keystate: Finished sending state debug3: mm_newkeys_from_blob: 0x7f0e958d5160(127) debug3: mm_request_send entering: type 80 debug3: mm_request_receive_expect entering: type 81 debug2: mac_setup: found hmac-sha1 debug3: mm_request_receive entering debug3: mm_get_keystate: Waiting for second key debug3: mm_newkeys_from_blob: 0x7f0e958d5160(127) debug2: mac_setup: found hmac-sha1 debug3: mm_get_keystate: Getting compression state debug3: mm_get_keystate: Getting Network I/O buffers debug3: mm_request_receive_expect entering: type 80 debug3: mm_request_receive entering debug3: mm_request_send entering: type 81 debug3: mm_share_sync: Share sync debug3: mm_share_sync: Share sync end debug1: temporarily_use_uid: 1000/1000 (e=0/0) debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism debug1: restore_uid: 0/0 debug1: SELinux support disabled debug1: PAM: establishing credentials debug3: PAM: opening session debug1: temporarily_use_uid: 1000/1000 (e=0/0) debug1: ssh_gssapi_storecreds: Not a GSSAPI mechanism debug1: restore_uid: 0/0 User child is on pid 103352 debug3: mm_request_receive entering debug1: PAM: establishing credentials debug1: permanently_set_uid: 1000/1000 debug2: set_newkeys: mode 0 debug2: set_newkeys: mode 1 debug1: Entering interactive session for SSH2. debug2: fd 6 setting O_NONBLOCK debug2: fd 7 setting O_NONBLOCK debug1: server_init_dispatch_20 debug1: server_input_channel_open: ctype session rchan 0 win 2097152 max 32768 debug1: input_session_request debug1: channel 0: new [server-session] debug2: session_new: allocate (allocated 0 max 10) debug3: session_unused: session id 0 unused debug1: session_new: session 0 debug1: session_open: channel 0 debug1: session_open: session 0: link with channel 0 debug1: server_input_channel_open: confirm session debug3: Wrote 52 bytes for a total of 2129 debug1: server_input_channel_req: channel 0 request pty-req reply 1 debug1: session_by_channel: session 0 channel 0 debug1: session_input_channel_req: session 0 req pty-req debug1: Allocating pty. debug3: mm_request_send entering: type 26 debug3: mm_pty_allocate: waiting for MONITOR_ANS_PTY debug3: mm_request_receive_expect entering: type 27 debug3: monitor_read: checking request 26 debug3: mm_request_receive entering debug3: mm_answer_pty entering debug2: session_new: allocate (allocated 0 max 10) debug3: session_unused: session id 0 unused debug1: session_new: session 0 debug3: mm_request_send entering: type 27 debug3: mm_answer_pty: tty /dev/pts/2 ptyfd 5 debug1: session_pty_req: session 0 alloc /dev/pts/2 debug3: mm_request_receive entering debug3: Wrote 36 bytes for a total of 2165 debug1: server_input_channel_req: channel 0 request exec reply 1 debug1: session_by_channel: session 0 channel 0 debug1: session_input_channel_req: session 0 req exec debug2: fd 3 setting TCP_NODELAY debug2: channel 0: rfd 10 isatty debug2: fd 10 setting O_NONBLOCK debug3: fd 8 is O_NONBLOCK debug3: Wrote 88 bytes for a total of 2253 debug1: Setting controlling tty using TIOCSCTTY. debug3: Wrote 372 bytes for a total of 2625 debug1: Received SIGCHLD. debug1: session_by_pid: pid 103353 debug1: session_exit_message: session 0 channel 0 pid 103353 debug2: channel 0: request exit-status confirm 0 debug1: session_exit_message: release channel 0 debug2: channel 0: write failed debug2: channel 0: close_write debug2: channel 0: send eow debug2: channel 0: output open -> closed debug3: mm_request_send entering: type 28 debug3: monitor_read: checking request 28 debug3: mm_answer_pty_cleanup entering debug2: notify_done: reading debug1: session_by_tty: session 0 tty /dev/pts/2 debug2: channel 0: read<=0 rfd 10 len -1 debug3: mm_session_close: session 0 pid 103352 debug2: channel 0: read failed debug3: mm_session_close: tty /dev/pts/2 ptyfd 5 debug2: channel 0: close_read debug1: session_pty_cleanup: session 0 release /dev/pts/2 debug2: channel 0: input open -> drain debug3: Wrote 152 bytes for a total of 2777 debug2: channel 0: ibuf empty debug2: channel 0: send eof debug2: channel 0: input drain -> closed debug2: channel 0: send close debug3: Wrote 72 bytes for a total of 2849 debug3: channel 0: will not send data after close debug2: channel 0: rcvd close debug3: channel 0: will not send data after close debug2: channel 0: is dead debug2: channel 0: gc: notify user debug1: session_by_channel: session 0 channel 0 debug1: session_close_by_channel: channel 0 child 0 debug1: session_close: session 0 pid 0 debug3: session_unused: session id 0 unused debug3: session_unused: session id 0 unused debug2: channel 0: gc: user detached debug2: channel 0: is dead debug3: mm_request_receive entering debug2: channel 0: garbage collecting debug1: channel 0: free: server-session, nchannels 1 debug3: channel 0: status: The following connections are open: #0 server-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1) debug3: channel 0: close_fds r -1 w -1 e -1 debug1: server_input_channel_open: ctype session rchan 1 win 2097152 max 32768 debug1: input_session_request debug1: channel 0: new [server-session] debug1: session_new: session 0 debug1: session_open: channel 0 debug1: session_open: session 0: link with channel 0 debug1: server_input_channel_open: confirm session debug3: Wrote 52 bytes for a total of 2901 debug1: server_input_channel_req: channel 0 request exec reply 1 debug1: session_by_channel: session 0 channel 0 debug1: session_input_channel_req: session 0 req exec debug3: mm_audit_run_command entering command scp -t -r '/tmp/scm_prepare_node.lHmTD4I5' debug3: mm_request_send entering: type 62 debug3: mm_request_receive_expect entering: type 63 debug3: mm_request_receive entering debug3: monitor_read: checking request 62 debug3: mm_answer_audit_command entering debug1: session_new: session 0 debug3: mm_request_send entering: type 63 debug3: mm_request_receive entering debug2: fd 10 setting O_NONBLOCK debug2: fd 9 setting O_NONBLOCK debug2: fd 12 setting O_NONBLOCK debug3: Wrote 88 bytes for a total of 2989 debug2: channel 0: read 280 from efd 12 debug2: channel 0: rwin 2097152 elen 280 euse 1 debug2: channel 0: sent ext data 280 debug3: Wrote 324 bytes for a total of 3313 debug3: Wrote 52 bytes for a total of 3365 debug3: Wrote 52 bytes for a total of 3417
You can see from that debug trace that ssh does see an attempt made to copy the install file:
debug3: mm_audit_run_command entering command scp -t -r '/tmp/scm_prepare_node.lHmTD4I5'
And yet nothing ever gets written, despite the fact that sshd reports various numbers of bytes written:
root@bdprodm10:[86]:/tmp/scm_prepare_node.lHmTD4I5> ls -l total 0 -rw-r--r-- 1 cminstall cminstall 0 May 3 14:40 scm_prepare_node.sh
Still have a hunch that it's an ssh configuration issue, but still looking as to what that might be.
Thanks for reading all of this 🙂
Chris
Created 05-03-2016 01:07 PM
Another update.
I've compared both the ssh_config file and sshd_config file from other servers with this one, and everything is the same. I'm batting 0 here.
I'm wondering if there is some compatibility issue between the version of the Java SSH libs used by CM and the version of OpenSSH on my CentOS 6.7 servers?
2016-05-03 20:02:21,468 INFO NodeScannerThread-6:net.schmizz.sshj.transport.TransportImpl: Client identity string: SSH-2.0-SSHJ_0_14_0 2016-05-03 20:02:21,477 INFO NodeScannerThread-6:net.schmizz.sshj.transport.TransportImpl: Server identity string: SSH-2.0-OpenSSH_5.3
I'm at a loss, completely.
Created 05-03-2016 01:29 PM
Kinda at the "grasping at straws" point now.
I had read on the https://github.com/hierynomus/sshj bug list something about using the unlimited JCE security jars, and a warning about "Illegal key size", which I did see in the CM logs, but dismissed as unimportant.
Just for fun, I installed the unlimited JCE jars from http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html on both the CM server and the agent server, but then CM wouldn't even start. It complained about "
Caused by: java.lang.NoClassDefFoundError: Could not initialize class javax.crypto.JceSecurity"
Sooo, guess that's not the problem either.
Chris
Created 05-03-2016 01:50 PM
how is your /tmp mounted? just noticed that "scm_prepare_node.sh" is 644 where it should be 755
Created 05-03-2016 01:55 PM
Thanks for the reply Michalis!
/tmp is 777:
root@bdprodm10:[134]:/tmp> ls -la total 203756 drwxrwxrwt. 14 root root 4096 May 3 20:41 .
I noticed on a working CM install that the next step in the install process is a CHMOD step that I never reach, because the COPY step never completes. I think that is why the permissions are 644 on the .sh file, and 700 on the directory itself:
root@bdprodm10:[136]:/tmp> ls -l | grep scm_prepare_node.mWfGt4sr drwx------ 2 cminstall cminstall 4096 May 3 20:22 scm_prepare_node.mWfGt4sr root@bdprodm10:[137]:/tmp> ls -l scm_prepare_node.mWfGt4sr total 0 -rw-r--r-- 1 cminstall cminstall 0 May 3 20:22 scm_prepare_node.sh
Again, thank you for your help!
Created 05-03-2016 02:26 PM
What's your network MTU size between your network?