Support Questions

MSharma · ‎07-10-2017

Hi ,

i am getting error while initiating cdsw

[root@docker ~]# cdsw init
Using user-specified config file: /etc/cdsw/config/cdsw.conf
Prechecking OS Version........[OK]
Prechecking scaling limits for processes........
WARNING: Cloudera Data Science Workbench recommends that all users have a max-user-processes limit of at least 65536.
It is currently set to [65535] as per 'ulimit -u'
Press enter to continue

Prechecking scaling limits for open files........
WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.
It is currently set to [65535] as per 'ulimit -n'
Press enter to continue

Prechecking that iptables are not configured........[OK]
Prechecking that SELinux is disabled........[OK]
Prechecking configured block devices and mountpoints........[OK]
Prechecking kernel parameters........[OK]
Prechecking that docker block devices are of adequate size........[OK]
Prechecking that application block devices are of adequate size........[OK]
Prechecking size of root volume........
WARNING: The recommended minimum root volume size is 100G. Press enter to continue

Prechecking that CDH gateway roles are configured........[OK]
Prechecking that /etc/krb5 file is not a placeholder........[OK]
Prechecking parcel paths........[OK]
Prechecking CDH client configurations........[OK]
Prechecking Java version........[OK]
Prechecking Java distribution........[OK]
Creating docker thinpool if it does not exist
Volume group "docker" not found
Cannot process volume group docker
Unmounting /dev/mapper/data01-data01
umount: /dev/mapper/data01-data01: not mounted
Removing Docker volume groups.
Volume group "docker" not found
Cannot process volume group docker
Volume group "docker" not found
Cannot process volume group docker
Cleaning up docker directories...
Wiping ext4 signature on /dev/mapper/data01-data01.
Physical volume "/dev/data01/data01" successfully created
Volume group "docker" successfully created
Logical volume "thinpool" created.
Logical volume "thinpoolmeta" created.
WARNING: Converting logical volume docker/thinpool and docker/thinpoolmeta to pool's data and metadata volumes.
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Converted docker/thinpool to thin pool.
Logical volume "thinpool" changed.
Initialize application storage at /var/lib/cdsw
Disabling node with IP [10.11.160.64]...
Node [10.11.160.64] removed from nfs export list successfully.
Stopping rpc-statd...
Stopping nfs-idmapd...
Stopping rpcbind...
Stopping nfs-server...
Removing entry from /etc/fstab...
Skipping format since volumes are already set correctly.
Adding entry to /etc/fstab...
Mounting [/var/lib/cdsw]...
Starting rpc-statd...
Enabling rpc-statd...
Starting nfs-idmapd...
Enabling nfs-idmapd...
Starting rpcbind...
Enabling rpcbind...
Starting nfs-server...
Enabling nfs-server...
Enabling node with IP [10.11.160.64]...
Node [10.11.160.64] added to nfs export list successfully.
Starting rpc-statd...
Enabling rpc-statd...
Starting nfs-idmapd...
Enabling nfs-idmapd...
Starting rpcbind...
Enabling rpcbind...
Starting nfs-server...
Enabling nfs-server...
Starting docker...
Enabling docker...
Starting ntpd...
Enabling ntpd...
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service.

ERROR:: Unable to reset weave networking state.: 125

MSharma · ‎07-11-2017

What port cdsw web url run?

#
# This domain for DNS and is unrelated to Kerberos or LDAP domains.
DOMAIN="cdsw.company.com"

# IPv4 address for the master node that is reachable from the worker nodes.
#
# Within an AWS VPC, MASTER_IP should be set to the internal IP
# of the master node; for instance, "10.251.50.12" corresponding to
# master node name of ip-10-251-50-12.ec2.internal.
MASTER_IP="10.11.140.64"

DOMAIN="cdsw.company.com shall i put my company domain like
cdsw.test.com or just test.com

View solution in original post

tristanzajonc · ‎07-10-2017

Does your node have internet access or a properly configured HTTP(S)_PROXY? This error can occur when Docker cannot download images from Cloudera's Docker registry. You may see additional information using "systemctl status docker" or "journalctl -u docker".

Please let us know if you see additional errors or changing your configuration resolves the issue.

Tristan

MSharma · ‎07-10-2017

after putting proxy setting it moved and now stuck at
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready

[root@docker ~]# systemctl status docker
● docker.service - docker
Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2017-07-10 14:04:12 EDT; 7min ago
Docs: https://docs.docker.com
Main PID: 20705 (dockerd)
Memory: 51.1M
CGroup: /system.slice/docker.service
├─20705 dockerd --log-driver=journald -s devicemapper --storage-opt dm.basesize=100G --storage-opt dm.thinpooldev=/dev/mapper...
└─20720 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout...

journalctl -u docker

10T14:12:06.695585728-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:07.695941688-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:12.395203820-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:12.695987744-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:17.694514892-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:20.696049524-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:25.393880714-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:27.695855879-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:30.695721595-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:32.695268088-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google
10T14:12:37.392576223-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image: gcr.io/google

[root@docker ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.repository.cloudera.com/cdsw/1.0.1/third-party/weaveexec 1.9.0 300f92429697 5 months ago 90.4 MB

MSharma · ‎07-10-2017

14:15:54.742150498-04:00" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: authenticationrequired"
14:15:54.742216009-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: authenticationrequired"

what authentication it is looking for ?

tristanzajonc · ‎07-10-2017

MSharma,

Can you pull the image manually?

docker pull gcr.io/google_containers/pause-amd64:3.0

There should be no need for authentication. You are likely facing some proxy misconfiguration or certificate validation error.

Best,

Tristan

MSharma · ‎07-10-2017

No

[root@docker ~]# docker pull gcr.io/google_containers/pause-amd64:3.0
Error response from daemon: Get https://gcr.io/v1/_ping: authenticationrequired

5:21:54.693338443-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:21:55.693601137-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:21:55.955985334-04:00" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: authenticationrequired"
15:21:55.956065010-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: authenticationrequired"
15:21:56.000584332-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: authenticationrequired"
15:21:56.000652326-04:00" level=error msg="Handler for POST /images/create returned error: Get https://gcr.io/v1/_ping: authenticationrequired"
15:21:58.396828751-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:21:58.696481059-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:06.695327527-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:07.692817530-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:13.393313446-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:14.693889069-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:19.693560635-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:22.693515008-04:00" level=error msg="Handler for GET /images/gcr.io/google_containers/pause-amd64:3.0/json returned error: No such image:
15:22:22.787636243-04:00" level=warning msg="Error getting v2 registry: Get https://gcr.io/v2/: authenticationrequired"
15:22:22.787671995-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v2/: authenticationrequired"
15:22:22.830792991-04:00" level=error msg="Attempting next endpoint for pull after error: Get https://gcr.io/v1/_ping: authenticationrequired"
15:22:22.830856352-04:00" level=error msg="Handler for POST /images/create returned error: Get https://gcr.io/v1/_ping: authenticationrequired"

but if it is able to pull other images then why failing for this one

REPOSITORY TAG IMAGE ID CREATED SIZE
docker.repository.cloudera.com/cdsw/1.0.1/third-party/weaveexec 1.9.0 300f92429697 5 months ago 90.4 MB

MSharma · ‎07-10-2017

It moved after changing proxy setting and now waiting in infinite loop

<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns

Kubernetes master initialised successfully!

You can now join any number of machines by running the following on each node:

kubeadm join --token=2a582a.63ec0427495ec31c 10.17.160.64

Added bootstrap token KUBE_TOKEN to /etc/cdsw/config/cdsw.conf

node "docker.test.com" tainted
daemonset "weave-net" created
Waiting for kube-system cluster to come up. This could take a few minutes...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...
Some pods in kube-system have not yet started. This may take a few minutes.
Waiting for 10 seconds before checking again...

tristanzajonc · ‎07-10-2017

In another window you can check the status with "kubectl get pods". The installation process downloads approximately 5GB of image data at this point, so it may take some amount of time if your internet connection is slow.

If your proxy is misconfigured you may run into issues downloading specific images. You can test your proxy configuration by attempting to pull an image manually, as previously mentioned.

Let us know if you continue to have issues.

Tristan

MSharma · ‎07-10-2017

ok , i have reset the cdsw and started again but this time it stuck with
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 18.555501 seconds
<master/apiclient> waiting for at least one node to register and become ready.
proxy i checked and looks ok.

kubectl get pods does not return anything

MSharma · ‎07-10-2017

ok after coiple of troubleshooting it moved and completed

Cloudera Data Science Workbench is not ready yet: some system pods are not ready

Master node configuration successful. The application may take up to 10
minutes to initially startup.

To check application status use:

$ watch cdsw status

but
Every 2.0s: cdsw status Mon Jul 10 23:49:23 2017

Cloudera Data Science Workbench Status

Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...

Node Status
Cloudera Data Science Workbench is not ready yet: kubectl command failed

[root@docker ~]# kubectl get pods --show-all
NAME READY STATUS RESTARTS AGE
cron-2934152315-mqsei 1/1 Running 0 20m
db-39862959-2gt3u 1/1 Running 0 20m
db-migrate-052787a-qhp19 0/1 Completed 0 20m
engine-deps-xfrsh 1/1 Running 0 20m
ingress-controller-3138093376-eacx8 1/1 Running 0 20m
livelog-1900214889-e8oys 1/1 Running 0 20m
reconciler-459456250-t057g 1/1 Running 0 20m
spark-port-forwarder-guxvb 1/1 Running 0 20m
web-3826671331-1fp2r 1/1 Running 0 20m
web-3826671331-66myb 1/1 Running 0 20m
web-3826671331-z8ark 1/1 Running 0 20m

is it still downloading the images ?

Cloudera Community

Support Questions

cdsw init failed