Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDSW - cdsw init does not finish

avatar
Contributor

I'm trying to install CDSW on an RHEL7.2, but cdsw init does not finish. The final status is:

Cloudera Data Science Workbench Status

Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...

Node Status
NAME STATUS AGE STATEFUL
dn183.pf4h.local Ready 1h <none>

System Pod status
NAME READY STATUS RESTARTS AGE
dummy-2088944543-kw60c 1/1 Running 0 1h
etcd-dn183.pf4h.local 1/1 Running 0 1h
kube-apiserver-dn183.pf4h.local 1/1 Running 0 1h
kube-controller-manager-dn183.pf4h.local 1/1 Running 0 1h
kube-discovery-1150918428-m9kup 1/1 Running 0 1h
kube-dns-654381707-f5yo4 2/3 Running 32 1h
kube-proxy-2zoxp 1/1 Running 0 1h
kube-scheduler-dn183.pf4h.local 1/1 Running 0 1h
weave-net-2s426 2/2 Running 0 1h

Cloudera Data Science Workbench Pod Status


WARNING: Unable to get details for role [cron].
Cloudera Data Science Workbench is not ready yet: the cluster specification is incomplete

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

I found the problem myself. It was a problem with the proxy settings. The proxy was also used for the internal status requests on net 100.66.0.0/24. I added all the addresses to the NO_PROXY list and now the workbench comes up. Hint in the documentation would be nice.

View solution in original post

7 REPLIES 7

avatar
Contributor

Here is the output of the init:

[root@dn183 ~]# cdsw init
Using user-specified config file: /etc/cdsw/config/cdsw.conf
Prechecking OS Version........[OK]
Prechecking scaling limits for processes........[OK]
Prechecking scaling limits for open files........[OK]
Prechecking that iptables are not configured........[OK]
Prechecking that SELinux is disabled........[OK]
Prechecking configured block devices and mountpoints........[OK]
Prechecking kernel parameters........[OK]
Prechecking that docker block devices are of adequate size........[OK]
Prechecking that application block devices are of adequate size........[OK]
Prechecking size of root volume........[OK]
Prechecking that CDH gateway roles are configured........[OK]
Prechecking that /etc/krb5 file is not a placeholder........
WARNING: The Kerberos configuration file [/etc/krb5.conf] seems to be a placeholder. If your CDH cluster is Kerberized, please copy /etc/krb5.conf to these Cloudera Data Science Workbench nodes.
Press enter to continue.

Prechecking parcel paths........[OK]
Prechecking CDH client configurations........[OK]
Prechecking Java version........[OK]
Prechecking Java distribution........[OK]
Creating docker thinpool if it does not exist
  Volume group "docker" not found
  Cannot process volume group docker
Unmounting /dev/sdb
umount: /dev/sdb: not mounted
Removing Docker volume groups.
  Volume group "docker" not found
  Cannot process volume group docker
  Volume group "docker" not found
  Cannot process volume group docker
Cleaning up docker directories...
  Physical volume "/dev/sdb" successfully created
  Volume group "docker" successfully created
WARNING: xfs signature detected on /dev/docker/thinpool at offset 0. Wipe it? [y/n]: y
  Wiping xfs signature on /dev/docker/thinpool.
  Logical volume "thinpool" created.
  Logical volume "thinpoolmeta" created.
  WARNING: Converting logical volume docker/thinpool and docker/thinpoolmeta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted docker/thinpool to thin pool.
  Logical volume "thinpool" changed.
Initialize application storage at /var/lib/cdsw
[/dev/sdc] is formatted as [] and not ext4, hence needs to be formatted.

WARNING: Formatting will erase all project files and data!
Are you sure? y
Starting rpc-statd...
Enabling rpc-statd...
Starting nfs-idmapd...
Enabling nfs-idmapd...
Starting rpcbind...
Enabling rpcbind...
Starting nfs-server...
Enabling nfs-server...
Proceeding to create storage for application...
mke2fs 1.42.9 (28-Dec-2013)
/dev/sdc is entire device, not just one partition!
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
73228288 inodes, 292896768 blocks
14644838 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2441084928
8939 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Enabling node with IP [192.168.239.183]...
Node [192.168.239.183] added to nfs export list successfully.
Starting rpc-statd...
Enabling rpc-statd...
Starting nfs-idmapd...
Enabling nfs-idmapd...
Starting rpcbind...
Enabling rpcbind...
Starting nfs-server...
Enabling nfs-server...
Starting docker...
Enabling docker...
Starting ntpd...
Enabling ntpd...
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service.
Initializing cluster...
Running pre-flight checks
<master/tokens> generated token: "afde63.f3bb72d9a16f9a74"
<master/pki> generated Certificate Authority key and certificate:
Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true
Not before: 2017-05-19 08:43:05 +0000 UTC Not After: 2027-05-17 08:43:05 +0000 UTC
Public: /etc/kubernetes/pki/ca-pub.pem
Private: /etc/kubernetes/pki/ca-key.pem
Cert: /etc/kubernetes/pki/ca.pem
<master/pki> generated API Server key and certificate:
Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false
Not before: 2017-05-19 08:43:05 +0000 UTC Not After: 2018-05-19 08:43:06 +0000 UTC
Alternate Names: [192.168.239.183 100.77.0.1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local]
Public: /etc/kubernetes/pki/apiserver-pub.pem
Private: /etc/kubernetes/pki/apiserver-key.pem
Cert: /etc/kubernetes/pki/apiserver.pem
<master/pki> generated Service Account Signing keys:
Public: /etc/kubernetes/pki/sa-pub.pem
Private: /etc/kubernetes/pki/sa-key.pem
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 42.548542 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 4.002406 seconds
<master/apiclient> attempting a test deployment
<master/apiclient> test deployment succeeded
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 24.002057 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns

Kubernetes master initialised successfully!

You can now join any number of machines by running the following on each node:

kubeadm join --token=afde63.f3bb72d9a16f9a74 192.168.239.183

Added bootstrap token KUBE_TOKEN to /etc/cdsw/config/cdsw.conf

node "dn183.pf4h.local" tainted
daemonset "weave-net" created
Waiting for kube-system cluster to come up. This could take a few minutes...
Some pods in kube-system have not yet started.  This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started.  This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started.  This may take a few minutes.
...
...
...
Waiting for 10 seconds before checking again...

ERROR:: Unable to bring up kube-system cluster.: 1

ERROR:: Unable to start clusters: 1

avatar
Contributor

I found the problem myself. It was a problem with the proxy settings. The proxy was also used for the internal status requests on net 100.66.0.0/24. I added all the addresses to the NO_PROXY list and now the workbench comes up. Hint in the documentation would be nice.

avatar
Community Manager

Thanks for reporting back with your solution. I'll forward it to the team so they can review it. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
New Contributor

 

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

Exporting user ids...

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

The connection to the server 192.168.31.140:6443 was refused - did you specify the right host or port?

Checking system logs...

Producing logs tarball...

Logs saved to: cdsw-logs-cdh3-2017-07-06--18-46-59.tar.gz

Redacting logs...

Producing redacted logs tarball...

Redacted logs saved to: cdsw-logs-cdh3-2017-07-06--18-46-59.redacted.tar.gz

Cleaning up...

[root@cdh3 ~]# docker pull gcr.io/google_containers/pause-amd64:3.0

^C

[root@cdh3 ~]# kubectl cluster-info dump

The connection to the server localhost:8080 was refused - did you specify the right host or port?

[root@cdh3 ~]# ping gcr.io

PING gcr.io (64.233.189.82) 56(84) bytes of data.

^C

--- gcr.io ping statistics ---

12 packets transmitted, 0 received, 100% packet loss, time 11000ms

 

[root@cdh3 ~]# ping gcr.io

PING gcr.io (64.233.189.82) 56(84) bytes of data.

avatar
New Contributor

Because I'm in china. I would like to ask, do I need to download the network when running csdw init initialization?

avatar
New Contributor

Is there any offline setup mode for data science workbench, and please contact me if possible?. Thank you

avatar
Explorer

Hi,

 

Can anyone guide me to make ftp between CDSW and Unix server, to upload & download the file, using R.