Created 06-28-2017 09:19 AM
This is on my fresh servers, after installing cloudera-workbench rpm, while doing "cdsw init", i got below
[root@hostname ~]# cdsw init Using user-specified config file: /etc/cdsw/config/cdsw.conf Prechecking OS Version........[OK] Prechecking scaling limits for processes........[OK] Prechecking scaling limits for open files........ WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576. It is currently set to [65535] as per 'ulimit -n' Press enter to continue Prechecking that iptables are not configured........ WARNING: Cloudera Data Science Workbench requires iptables, but does not support preexisting iptables rules. Press enter to continue Prechecking that SELinux is disabled........[OK] Prechecking configured block devices and mountpoints........[OK] Prechecking kernel parameters........[OK] Prechecking that docker block devices are of adequate size........[OK] Prechecking that application block devices are of adequate size........[OK] Prechecking size of root volume........ WARNING: The recommended minimum root volume size is 100G. Press enter to continue Prechecking that CDH gateway roles are configured........[OK] Prechecking that /etc/krb5 file is not a placeholder........ WARNING: The Kerberos configuration file [/etc/krb5.conf] seems to be a placeholder. If your CDH cluster is Kerberized, please copy /etc/krb5.conf to these Cloudera Data Science Workbench nodes. Press enter to continue. Prechecking parcel paths........ WARNING: CDH parcels not found at /opt/cloudera/parcels. If you are using a custom parcel directory, please set it in the Cloudera Data Science Workbench admin panel once the site is running. Otherwise, please add your Cloudera Data Science Workbench nodes to your CDH cluster. Press enter to continue. Prechecking CDH client configurations........ WARNING: CDH client configuration not found at /etc/spark2-conf. Press enter to continue Prechecking Java version........[OK] Prechecking Java distribution........ WARNING: OpenJDK is not supported. Press enter to continue Creating docker thinpool if it does not exist --- Logical volume --- LV Name thinpool VG Name docker LV UUID E0BUAz-vz2P-DoPB-5u9w-Zjjl-wPhQ-deiZqt LV Write Access read/write LV Creation host, time hostname, 2017-06-28 11:08:49 +0000 LV Pool metadata thinpool_tmeta LV Pool data thinpool_tdata LV Status available # open 0 LV Size 5.19 TiB Allocated pool data 0.00% Allocated metadata 0.02% Current LE 1359251 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:6 Docker thinpool already configured. Initialize application storage at /var/lib/cdsw Disabling node with IP [10.*.*.*]... Node [10.*.*.*] removed from nfs export list successfully. Stopping rpc-statd... Stopping nfs-idmapd... Stopping rpcbind... Stopping nfs-server... Removing entry from /etc/fstab... Unmounting [/dev/sdc1]... Skipping format since volumes are already set correctly. Adding entry to /etc/fstab... Mounting [/var/lib/cdsw]... Starting rpc-statd... Enabling rpc-statd... Starting nfs-idmapd... Enabling nfs-idmapd... Starting rpcbind... Enabling rpcbind... Starting nfs-server... Enabling nfs-server... Enabling node with IP [10.*.*.*]... Node [10.*.*.*] added to nfs export list successfully. Starting rpc-statd... Enabling rpc-statd... Starting nfs-idmapd... Enabling nfs-idmapd... Starting rpcbind... Enabling rpcbind... Starting nfs-server... Enabling nfs-server... Starting docker... Enabling docker... Starting ntpd... Enabling ntpd... Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service. Initializing cluster... Running pre-flight checks <master/tokens> generated token: "1df27c.244acfe17e00a402" <master/pki> generated Certificate Authority key and certificate: Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true Not before: 2017-06-28 16:07:45 +0000 UTC Not After: 2027-06-26 16:07:45 +0000 UTC Public: /etc/kubernetes/pki/ca-pub.pem Private: /etc/kubernetes/pki/ca-key.pem Cert: /etc/kubernetes/pki/ca.pem <master/pki> generated API Server key and certificate: Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false Not before: 2017-06-28 16:07:45 +0000 UTC Not After: 2018-06-28 16:07:46 +0000 UTC Alternate Names: [10.*.*.* 100.77.0.1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] Public: /etc/kubernetes/pki/apiserver-pub.pem Private: /etc/kubernetes/pki/apiserver-key.pem Cert: /etc/kubernetes/pki/apiserver.pem <master/pki> generated Service Account Signing keys: Public: /etc/kubernetes/pki/sa-pub.pem Private: /etc/kubernetes/pki/sa-key.pem <master/pki> created keys and certificates in "/etc/kubernetes/pki" <util/kubeconfig> created "/etc/kubernetes/kubelet.conf" <util/kubeconfig> created "/etc/kubernetes/admin.conf" <master/apiclient> created API client configuration <master/apiclient> created API client, waiting for the control plane to become ready
Its stopped at
created API client, waiting for the control plane to become ready
Can anyone help with this
Created 06-28-2017 09:58 AM
Created 06-28-2017 10:14 AM
below is the output of mentioned commands
[root@hostname ~]# systemctl status docker ● docker.service - docker Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2017-06-28 16:07:41 GMT; 59min ago Docs: https://docs.docker.com Main PID: 136608 (dockerd) Memory: 25.7M CGroup: /system.slice/docker.service ├─136608 dockerd --log-driver=journald -s devicemapper --storage-opt dm.basesize=100G --storage-opt dm.thinpooldev=/dev/mapper/docker-thinpool --storage-opt dm.use_deferred_rem... └─136625 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/cont... Jun 28 17:06:53 hostname docker[136608]: time="2017-06-28T17:06:53.813294239Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0...-amd64:3.0" Jun 28 17:06:55 hostname docker[136608]: time="2017-06-28T17:06:55.512839511Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0...-amd64:3.0" Jun 28 17:07:00 hostname docker[136608]: time="2017-06-28T17:07:00.813523993Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0...-amd64:3.0" Jun 28 17:07:01 hostname docker[136608]: time="2017-06-28T17:07:01.986498328Z" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: net/http: T...ke timeout" Jun 28 17:07:01 hostname docker[136608]: time="2017-06-28T17:07:01.986557602Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.i...ke timeout" Jun 28 17:07:05 hostname docker[136608]: time="2017-06-28T17:07:05.813261563Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0...-amd64:3.0" Jun 28 17:07:08 hostname docker[136608]: time="2017-06-28T17:07:08.512474126Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0...-amd64:3.0" Jun 28 17:07:11 hostname docker[136608]: time="2017-06-28T17:07:11.813302105Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0...-amd64:3.0" Jun 28 17:07:12 hostname docker[136608]: time="2017-06-28T17:07:12.078578325Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.i...ke timeout" Jun 28 17:07:12 hostname docker[136608]: time="2017-06-28T17:07:12.078630585Z" level=error msg="Handler for POST /images/create returned error: Get https://gcr....ke timeout" Hint: Some lines were ellipsized, use -l to show in full. [root@hostname ~]# docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker.repository.cloudera.com/cdsw/1.0.1/third-party/weaveexec 1.9.0 300f92429697 4 months ago 90.4 MB [root@hostname~]# journalctl -u docker -- Logs begin at Wed 2017-06-28 12:13:22 GMT, end at Wed 2017-06-28 17:07:39 GMT. -- Jun 28 12:15:34 hostname systemd[1]: Starting docker... Jun 28 12:15:34 hostname docker[26682]: Command "daemon" is deprecated, and will be removed in Docker 1.16. Please run `dockerd` directly. Jun 28 12:15:34 hostname docker[26682]: time="2017-06-28T12:15:34.677250601Z" level=info msg="libcontainerd: new containerd process, pid: 26751" Jun 28 12:15:35 hostname docker[26682]: time="2017-06-28T12:15:35.871080507Z" level=warning msg="devmapper: Base device already exists and has filesystem xfs on it. User spec Jun 28 12:15:35 hostname docker[26682]: time="2017-06-28T12:15:35.919664135Z" level=info msg="Graph migration to content-addressability took 0.00 seconds" Jun 28 12:15:35 hostname docker[26682]: time="2017-06-28T12:15:35.920182783Z" level=warning msg="mountpoint for pids not found" Jun 28 12:15:35 hostname docker[26682]: time="2017-06-28T12:15:35.920875579Z" level=info msg="Loading containers: start." Jun 28 12:15:35 hostname docker[26682]: time="2017-06-28T12:15:35.952827173Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon op Jun 28 12:15:35 hostname docker[26682]: time="2017-06-28T12:15:35.967289128Z" level=info msg="Loading containers: done."
Jun 28 12:15:36 hostname docker[26682]: time="2017-06-28T12:15:36.010477234Z" level=info msg="Daemon has completed initialization"
Jun 28 12:15:36 hostname docker[26682]: time="2017-06-28T12:15:36.010527297Z" level=info msg="Docker daemon" commit=49bf474 graphdriver=devicemapper version=1.13.0
Jun 28 12:15:36 hostname systemd[1]: Started docker.
Jun 28 12:15:36 hostname docker[26682]: time="2017-06-28T12:15:36.024593232Z" level=info msg="API listen on /var/run/docker.sock"
Jun 28 12:15:37 hostname docker[26682]: time="2017-06-28T12:15:37.400330485Z" level=warning msg="mountpoint for pids not found"
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.450533401Z" level=error msg="Handler for DELETE /v1.21/networks/weave returned error: network weave not foun
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.518085080Z" level=error msg="Error setting up exec command in container weaveproxy: No such container: weave
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.518139532Z" level=error msg="Handler for POST /v1.22/containers/weaveproxy/exec returned error: No such cont
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.585161570Z" level=error msg="Handler for GET /v1.22/containers/weave/json returned error: No such container:
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.585823308Z" level=error msg="Handler for GET /v1.22/images/weave/json returned error: No such image: weave"
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.645337877Z" level=error msg="Handler for POST /v1.22/containers/weaveplugin/stop returned error: No such con
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.702047141Z" level=error msg="Handler for DELETE /v1.22/containers/weaveplugin returned error: No such contai
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.755125181Z" level=error msg="Handler for POST /v1.22/containers/weave/stop returned error: No such container
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.820383586Z" level=error msg="Handler for DELETE /v1.22/containers/weave returned error: No such container: w
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.866553352Z" level=error msg="Handler for POST /v1.22/containers/weaveproxy/stop returned error: No such cont
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.934899070Z" level=error msg="Handler for DELETE /v1.22/containers/weaveproxy returned error: No such contain
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.991199157Z" level=error msg="Handler for GET /v1.22/containers/weaveplugin/json returned error: No such cont
Jun 28 12:15:38 hostname docker[26682]: time="2017-06-28T12:15:38.991688746Z" level=error msg="Handler for GET /v1.22/images/weaveplugin/json returned error: No such image: w
Jun 28 12:15:39 hostname docker[26682]: time="2017-06-28T12:15:39.945499212Z" level=error msg="containerd: deleting container" error="exit status 1: \"container 55fa740127dbb
Jun 28 12:15:52 hostname docker[26682]: time="2017-06-28T12:15:52.260203323Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:15:52 hostname docker[26682]: time="2017-06-28T12:15:52.554632409Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:15:52 hostname docker[26682]: time="2017-06-28T12:15:52.555863242Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:15:52 hostname docker[26682]: time="2017-06-28T12:15:52.557338775Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:16:02 hostname docker[26682]: time="2017-06-28T12:16:02.458600197Z" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: net/http: TLS handshake ti
Jun 28 12:16:02 hostname docker[26682]: time="2017-06-28T12:16:02.458659007Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: net/http
Jun 28 12:16:12 hostname docker[26682]: time="2017-06-28T12:16:12.551326574Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: net
Jun 28 12:16:12 hostname docker[26682]: time="2017-06-28T12:16:12.551407914Z" level=error msg="Handler for POST /images/create returned error: Get https://gcr.io/v1/_ping: ne
Jun 28 12:16:22 hostname docker[26682]: time="2017-06-28T12:16:22.724526575Z" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: net/http: TLS handshake ti
Jun 28 12:16:22 hostname docker[26682]: time="2017-06-28T12:16:22.724587795Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: net/http
Jun 28 12:16:26 hostname docker[26682]: time="2017-06-28T12:16:26.250424062Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:16:32 hostname docker[26682]: time="2017-06-28T12:16:32.817042449Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: net
Jun 28 12:16:32 hostname docker[26682]: time="2017-06-28T12:16:32.817107756Z" level=error msg="Handler for POST /images/create returned error: Get https://gcr.io/v1/_ping: ne
Jun 28 12:16:42 hostname docker[26682]: time="2017-06-28T12:16:42.990655817Z" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: net/http: TLS handshake ti
Jun 28 12:16:42 hostname docker[26682]: time="2017-06-28T12:16:42.990714640Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: net/http
Jun 28 12:16:45 hostname docker[26682]: time="2017-06-28T12:16:45.550698377Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:16:53 hostname docker[26682]: time="2017-06-28T12:16:53.083222674Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: net
Jun 28 12:16:53 hostname docker[26682]: time="2017-06-28T12:16:53.083296701Z" level=error msg="Handler for POST /images/create returned error: Get https://gcr.io/v1/_ping: ne
Jun 28 12:17:03 hostname docker[26682]: time="2017-06-28T12:17:03.257122769Z" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: net/http: TLS handshake ti
Jun 28 12:17:03 hostname docker[26682]: time="2017-06-28T12:17:03.257220350Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: net/http
Jun 28 12:17:07 hostname docker[26682]: time="2017-06-28T12:17:07.550742117Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:17:13 hostname docker[26682]: time="2017-06-28T12:17:13.349490853Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: net
Jun 28 12:17:13 hostname docker[26682]: time="2017-06-28T12:17:13.349596870Z" level=error msg="Handler for POST /images/create returned error: Get https://gcr.io/v1/_ping: ne
Jun 28 12:17:23 hostname docker[26682]: time="2017-06-28T12:17:23.522568083Z" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: net/http: TLS handshake ti
Jun 28 12:17:23 hostname docker[26682]: time="2017-06-28T12:17:23.522623816Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: net/http
Jun 28 12:17:25 hostname docker[26682]: time="2017-06-28T12:17:25.550683773Z" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned
Jun 28 12:17:33 hostname docker[26682]: time="2017-06-28T12:17:33.615292807Z" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: net
lines 1-56
Created 06-28-2017 10:28 AM
Hi,
This seems like a network/firewall/proxy issue to me.
Could you try a manual pull for the image which is shown in the logs?
docker pull gcr.io/google_containers/pause-amd64:3.0
Thanks,
Peter
Created 06-28-2017 10:34 AM
Created 06-28-2017 11:56 PM
cdsw logs showing below output
[root@hostname ~]# cdsw logs Generating Cloudera Data Science Workbench diagnostic bundle... Checking system basics... Saving kernel parameters... Checking validation output... Checking application configuration... Checking disks... Checking Hadoop configuration... Checking network... Checking system services... Checking Docker... Checking Kubernetes... Checking Kubelet... Checking application services... Checking cluster info... Checking app cluster info... The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? Exporting user ids... The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port? Checking system logs... Producing logs tarball... Logs saved to: cdsw-logs-hostname-2017-06-29--06-49-53.tar.gz Redacting logs... Producing redacted logs tarball... Redacted logs saved to: cdsw-logs-hostname-2017-06-29--06-49-53.redacted.tar.gz Cleaning up... [root@hostname ~]#
Created 06-29-2017 12:15 AM
Hi Krishna,
Did the init command move forward or it's still here?
created API client, waiting for the control plane to become ready
Could you hit a cdsw status command and send us the output?
Thanks,
Peter
Created 06-29-2017 12:20 AM
Hi Peter,
Still it not moving forward. Below is cdsw status output
[root@hostname ~]# cdsw status Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status Cloudera Data Science Workbench is not ready yet: kubectl command failed
Kindly check my earlier update, while collecting the logs its showing
The connection to the server 10.x.x.x:6443 was refused - did you specify the right host or port?
Created 06-29-2017 12:29 AM
Hi,
Your issue is still the same, the kube-system docker images can't be downloaded as we saw in the 'systemctl status docker' and 'journalctl -u docker' outputs. As a consequence of this, the 'cdsw logs' is throwing the "The connection to the server (...) was refused" messages when it tries to get the details (kubectl cluster-info dump) of your kubernetes system.
I would work on why are you getting the timeout when you submit a pull command:
docker pull gcr.io/google_containers/pause-amd64:3.0
There should be a problem with your firewall/proxy configuration.
Regards,
Peter
Created 06-29-2017 12:44 AM
Hi Peter,
We are not using Proxy and firewall also stopped.
While running kubectl cluster-info dump, getting below error
[root@hostname cdsw-logs-hostname-2017-06-29--06-49-53]# kubectl cluster-info dump The connection to the server localhost:8080 was refused - did you specify the right host or port?
I have shared a link for cdsw logs files, kindly check your private messages. I hope this will give more infomation.
Thanks
Krishna
Created 06-29-2017 01:25 AM
Hi Krishna,
I see that you configured NO_PROXY even though you haven't set any HTTP_PROXY/HTTPS_PROXY.
NO_PROXY="localhost,127.0.0.1"
This is not needed. I tried to reproduce this issue with adding this to my configuration and do a cdsw reset/cdsw init but it didn't cause any problems for me. Regardless, I think it's worth trying to remove this and redo the cdsw reset/cdsw init again.
Regards,
Peter
Created 06-29-2017 01:40 AM
Created 06-29-2017 05:14 AM
Created on 06-29-2017 09:05 PM - edited 06-29-2017 10:47 PM
@peter_ableda Any update
The same installation is working fine in VM's. This i'm doing on physical machines
Created 07-02-2017 11:53 PM
@peter_ableda Any update
Created 07-05-2017 01:29 AM
Hi Krishna,
So the docker pull gcr.io/google_containers/pause-amd64:3.0 times out for you. Can you even reach the gcr.io site? Could you try to ping it?
$ ping gcr.io
Thanks,
Peter
Created 07-06-2017 04:55 AM
Created 07-10-2017 01:18 PM
did you find any solution to this error . i am also getting same .