Created on 06-07-2017 02:27 AM - edited 09-16-2022 04:43 AM
I installed the Data Science Workbench on a gateway node, and it seems that it's all up and running, without any errors. However, for some reason the docker containers do not have access to the internet, so I can't install any packages in them, etc. The exact error message is:
Step 2/12 : RUN apt-get update -y ---> Running in 435f1addc906 Err:1 http://security.debian.org testing/updates InRelease Temporary failure resolving 'security.debian.org' Err:2 http://deb.debian.org/debian testing InRelease Temporary failure resolving 'deb.debian.org' Err:3 http://http.debian.net/debian sid InRelease Temporary failure resolving 'http.debian.net' Err:4 http://deb.debian.org/debian testing-updates InRelease Temporary failure resolving 'deb.debian.org' Reading package lists... W: Failed to fetch http://deb.debian.org/debian/dists/testing/InRelease Temporary failure resolving 'deb.debian.org' W: Failed to fetch http://deb.debian.org/debian/dists/testing-updates/InRelease Temporary failure resolving 'deb.debian.org' W: Failed to fetch http://security.debian.org/dists/testing/updates/InRelease Temporary failure resolving 'security.debian.org' W: Failed to fetch http://http.debian.net/debian/dists/sid/InRelease Temporary failure resolving 'http.debian.net' W: Some index files failed to download. They have been ignored, or old ones used instead.
The output of cdsw status:
Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status NAME STATUS AGE STATEFUL ip-xx.eu-west-1.compute.internal Ready 15d true System Pod status NAME READY STATUS RESTARTS AGE dummy-2088944543-pfazy 1/1 Running 0 15d etcd-ip-xx.eu-west-1.compute.internal 1/1 Running 0 15d kube-apiserver-ip-xx.eu-west-1.compute.internal 1/1 Running 0 15d kube-controller-manager-ip-xx.eu-west-1.compute.internal 1/1 Running 0 15d kube-discovery-1150918428-50nmx 1/1 Running 0 15d kube-dns-3873593988-gg6s2 3/3 Running 0 15d kube-proxy-0j15p 1/1 Running 0 15d kube-scheduler-ip-xx.eu-west-1.compute.internal 1/1 Running 0 15d node-problem-detector-v0.1-ktr13 1/1 Running 0 15d weave-net-r8j2g 2/2 Running 0 15d Cloudera Data Science Workbench Pod Status NAME READY STATUS RESTARTS AGE ROLE cron-3971587342-ddoca 1/1 Running 0 15d cron db-4066525870-qchwg 1/1 Running 0 15d db db-migrate-abec968-oxxek 0/1 Completed 0 15d db-migrate dhqrwn5eobowq3ea 0/2 Pending 0 4d console engine-deps-ufifx 1/1 Running 0 15d engine-deps ingress-controller-2976678207-g88f5 1/1 Running 0 15d ingress-controller livelog-2494298876-chy37 1/1 Running 0 15d livelog reconciler-577027981-slrwk 1/1 Running 0 15d reconciler spark-port-forwarder-7ixp4 1/1 Running 0 15d spark-port-forwarder web-1304125449-2of76 1/1 Running 2 15d web web-1304125449-q3rbd 1/1 Running 0 15d web web-1304125449-vydxd 1/1 Running 1 15d web
What do I need to change to have internet access inside the docker containers?
Thanks!
Created 06-14-2017 03:00 PM
For real installations, you should pull from a repository. This will ensure all the nodes in your CDSW cluster have access to the image, not just the node where you built it. Moreover, you should not assume that your Docker image store is persistent across upgrades or in long-running clusters where we may evict less used images to free space. By pushing your custom images to a repository, you will ensure that images are never deleted due to image eviction policies or other administration tasks.
Created 06-07-2017 10:12 AM
The Docker daemon within Cloudera Data Science Workbench runs with --iptables=false option. This means that you need to build with docker build --net=host if you need internet connectivity.
Note that Cloudera does not support or recommend using the internal Docker for builds or third-party use cases. Doing so will break how Cloudera Data Science Workbench allocates CPU and memory resources to jobs and sessions.
Created 06-14-2017 04:17 AM
Thanks for the answer!
Two questions: it seems that docker build does not have the --net option, only docker run. What can I do to include this setting in the build? What's the supported way of adding/changing a docker image for CDSW? Should I pull it from a repo?
Created 06-14-2017 03:00 PM
For real installations, you should pull from a repository. This will ensure all the nodes in your CDSW cluster have access to the image, not just the node where you built it. Moreover, you should not assume that your Docker image store is persistent across upgrades or in long-running clusters where we may evict less used images to free space. By pushing your custom images to a repository, you will ensure that images are never deleted due to image eviction policies or other administration tasks.