Created on 11-13-2019 09:08 AM - last edited on 11-13-2019 09:44 AM by VidyaSargur
All,
I'm getting errors while installing CDSW in my cluster.
Below is the output of cdsw validate command.
[centos@cdsw ~]$ sudo cdsw validate
[Validating host configuration]
> Prechecking OS Version........[OK]
> Prechecking kernel Version........[OK]
> Prechecking that SELinux is disabled........[OK]
> Prechecking scaling limits for processes........[OK]
> Prechecking scaling limits for open files........
WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.
It is currently set to [1024] as per 'ulimit -n'
Press enter to continue
> Loading kernel module [ip_tables]...
> Loading kernel module [iptable_nat]...
> Loading kernel module [iptable_filter]...
> Prechecking that iptables are not configured........
WARNING: Cloudera Data Science Workbench requires iptables, but does not support preexisting iptables rules. Press enter to continue
> Prechecking kernel parameters........[OK]
> Prechecking to ensure kernel memory accounting disabled:........[OK]
> Prechecking Java distribution and version........[OK]
> Checking unlimited Java encryption policy for AES........[OK]
> Prechecking size of root volume........[OK]
[Validating networking setup]
> Checking if kubelet iptables rules exist
The following chains are missing from iptables: [KUBE-EXTERNAL-SERVICES, WEAVE-NPC-EGRESS, WEAVE-NPC, WEAVE-NPC-EGRESS-ACCEPT, KUBE-SERVICES, WEAVE-NPC-INGRESS, WEAVE-NPC-EGRESS-DEFAULT, WEAVE-NPC-DEFAULT, WEAVE-NPC-EGRESS-CUSTOM, KUBE-FIREWALL]
WARNING:: Verification of iptables rules failed: 1
> Checking if DNS server is running on localhost
> Checking the number of DNS servers in resolv.conf
> Checking DNS entries for CDSW main domain
WARNING:: DNS doesn't resolve cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1
> Checking reverse DNS entries for CDSW main domain
WARNING:: DNS doesn't resolve cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1
> Checking DNS entries for CDSW wildcard domain
WARNING:: DNS doesn't resolve *.cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1
> Checking that firewalld is disabled
[Validating Kubernetes versions]
> Checking kubernetes client version
> Checking kubernetes server version
WARNING:: Kubernetes server is not running, version couldn't be checked.: 1
[Validating NFS and Application Block Device setup]
> Checking if nfs or nfs-server is active and enabled
> Checking if rpcbind.socket is active and enabled
> Checking if rpcbind.service is active and enabled
> Checking if the project folder is exported over nfs
WARNING:: The projects folder /var/lib/cdsw/current/projects must be exported over nfs: 1
> Checking if application mountpoint exists
> Checking if the application directory is on a separate block device
WARNING:: The application directory is mounted on the root device.: 1
> Checking the root directory (/) free space
> Checking the application directory (/var/lib/cdsw) free space
[Validating Kubernetes cluster state]
> Checking if we have exactly one master node
WARNING:: There must be exactly one Kubernetes node labelled 'stateful=true': 1
> Checking if the Kubernetes nodes are ready
> Checking kube-apiserver pod
WARNING: Unable to reach k8s pod kube-apiserver.
WARNING: [kube-apiserver] pod(s) are not ready under kube-system namespace.
WARNING: Unable to bring up kube-apiserver in the kube-system cluster. Skipping other checks..
[Validating CDSW application]
> Checking connectivity over ingress
WARNING:: Could not curl the application over the ingress controller: 7
--------------------------------------------------------------------------
Errors detected.
Please review the issues listed above. Further details can be collected by
capturing logs from all nodes using "cdsw logs".
------------------------------------------------------------------------------------------------------------------------------
Error 1) "DNS is unable to resolve the hostname": i have the following details in the /etc/hosts file (sorry if i've missed something elementary here):
[centos@cdsw ~]$ sudo cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.40.3 master.company.fr master
172.16.40.4 worker1.company.fr worker1
172.16.40.5 worker2.company.fr worker2
172.16.40.6 worker3.company.fr worker3
172.16.40.7 cdsw.company.fr cdws
172.16.40.7 *.cdsw.company.fr
Error 2) "Unable to reach k8s pod kube-apiserver"
Does anyone know what is causing this issue?
Many thanks,
Created 11-14-2019 01:37 PM
Just an update on this. the DNS issue is now resolved but there are some errors related to kubernetes. Anyone has any pointers for me?
[centos@cdsw var]$ sudo cdsw validate
[Validating host configuration]
> Prechecking OS Version........[OK]
> Prechecking kernel Version........[OK]
> Prechecking that SELinux is disabled........[OK]
> Prechecking scaling limits for processes........[OK]
> Prechecking scaling limits for open files........
WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.
It is currently set to [1024] as per 'ulimit -n'
Press enter to continue
> Loading kernel module [ip_tables]...
> Loading kernel module [iptable_nat]...
> Loading kernel module [iptable_filter]...
> Prechecking that iptables are not configured........[OK]
> Prechecking kernel parameters........[OK]
> Prechecking to ensure kernel memory accounting disabled:........[OK]
> Prechecking Java distribution and version........[OK]
> Checking unlimited Java encryption policy for AES........[OK]
> Prechecking size of root volume........[OK]
[Validating networking setup]
> Checking if kubelet iptables rules exist
The following chains are missing from iptables: [KUBE-EXTERNAL-SERVICES, WEAVE-NPC-EGRESS, WEAVE-NPC, WEAVE-NPC-EGRESS-ACCEPT, KUBE-SERVICES, WEAVE-NPC-INGRESS, WEAVE-NPC-EGRESS-DEFAULT, WEAVE-NPC-DEFAULT, WEAVE-NPC-EGRESS-CUSTOM]
WARNING:: Verification of iptables rules failed: 1
> Checking if DNS server is running on localhost
> Checking the number of DNS servers in resolv.conf
> Checking DNS entries for CDSW main domain
> Checking reverse DNS entries for CDSW main domain
> Checking DNS entries for CDSW wildcard domain
> Checking that firewalld is disabled
[Validating Kubernetes versions]
> Checking kubernetes client version
> Checking kubernetes server version
WARNING:: Kubernetes server is not running, version couldn't be checked.: 1
[Validating NFS and Application Block Device setup]
> Checking if nfs or nfs-server is active and enabled
> Checking if rpcbind.socket is active and enabled
> Checking if rpcbind.service is active and enabled
> Checking if the project folder is exported over nfs
WARNING:: The projects folder /var/lib/cdsw/current/projects must be exported over nfs: 1
> Checking if application mountpoint exists
> Checking if the application directory is on a separate block device
WARNING:: The application directory is mounted on the root device.: 1
> Checking the root directory (/) free space
> Checking the application directory (/var/lib/cdsw) free space
[Validating Kubernetes cluster state]
> Checking if we have exactly one master node
WARNING:: There must be exactly one Kubernetes node labelled 'stateful=true': 1
> Checking if the Kubernetes nodes are ready
> Checking kube-apiserver pod
WARNING: Unable to reach k8s pod kube-apiserver.
WARNING: [kube-apiserver] pod(s) are not ready under kube-system namespace.
WARNING: Unable to bring up kube-apiserver in the kube-system cluster. Skipping other checks..
[Validating CDSW application]
> Checking connectivity over ingress
WARNING:: Could not curl the application over the ingress controller: 7
--------------------------------------------------------------------------
Errors detected.
Please review the issues listed above. Further details can be collected by
capturing logs from all nodes using "cdsw logs".
Created 11-15-2019 06:05 AM
@VamshiDevraj please remove the old IP tables from the host by running below command.
1. Stop CDSW service.
2. Run below commands on master host.
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
3. Start Docker role on master node, once complete than start Master role on the master node and then Application role on Master node.
Run cdsw status command and send the output here. It will be good if you can send the cdsw logs output from the node.
Created 11-15-2019 07:26 AM
Hi @GangWar ,
Many Thanks you for the help. Here is the output of CDSW status & CDSW logs (I'm still getting issues):
[centos@cdsw ~]$ sudo cdsw status
Sending detailed logs to [/tmp/cdsw_status_DKZbM9.log] ...
CDSW Version: [1.6.0.1294376:46715e4]
Installed into namespace 'default'
OK: Application running as root check
OK: NFS service check
OK: System process check for CSD install
OK: Sysctl params check
OK: Kernel memory slabs check
Failed to run CDSW Nodes Check.
Failed to run CDSW system pods check.
Failed to run CDSW application pods check.
Failed to run CDSW services check.
Failed to run CDSW secrets check.
Failed to run CDSW persistent volumes check.
Failed to run CDSW persistent volumes claims check.
Failed to run CDSW Ingresses check.
Checking web at url: http://cdsw.ezydata.fr
Web is not yet up.
Cloudera Data Science Workbench is not ready yet
-------------------------------------------------------------------------------------------
[centos@cdsw ~]$ sudo cdsw logs
Generating Cloudera Data Science Workbench diagnostic bundle...
Collecting basic system info...
Collecting kernel parameters...
Collecting kernel messages...
Collecting the list of kernel modules...
Collecting the list of systemd units...
Collecting cdsw details...
Collecting application configuration...
Collecting disks information...
Collecting Hadoop configuration...
Collecting network information...
Collecting system service statuses...
Collecting nfs information...
Collecting Docker info...
Collecting Kubernetes info...
Collecting Helm info...
Collecting custom patches...
cp: cannot stat ‘/etc/cdsw/patches’: No such file or directory
Collecting Kubelet logs...
Collecting CDSW Host Controller logs...
Collecting system logs...
Collecting Kubernetes cluster info dump...
ls: cannot access cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info/*/*/logs. txt: No such file or directory
Exporting user ids...
The connection to the server 172.16.40.7:6443 was refused - did you specify the right host or port?
The connection to the server 172.16.40.7:6443 was refused - did you specify the right host or port?
error: pod name must be specified
Collecting health logs...
Redacting logs...
find: ‘cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info’: No such file or di rectory
find: ‘cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info’: No such file or di rectory
Producing redacted logs tarball...
Logs saved to: cdsw-logs-cdsw-2019-11-15--15-21-38.redacted.tar.gz
Cleaning up...
Created 11-15-2019 07:47 AM
The output is truncated. Can you upload the full output of cdsw status or you can just attach the file
cdsw-logs-cdsw-2019-11-15--15-21-38.redacted.tar.gz here. So that I can see the logs.
Also please make sure you have followed all the prerequisite from here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html#pre_in...
Make sure you have IPv6 enabled in the CDSW host. Take a look of known issue here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_known_issues.html#k...
Created 11-18-2019 04:30 AM
https://drive.google.com/file/d/1LdbWohArkQaeYHnyu7VIOcbOciUkbEhR/view?usp=sharing
Hi @GangWar,
Please find attached the cdsw logs in the link.
I've checked that IPv6 is enabled and I believe I have taken care of all the pre-requisites.
Any inputs are most welcome.
Regards,
Vamshi
Created on 11-19-2019 08:16 AM - edited 11-19-2019 08:25 AM
Hi,
Observed that my Linux version is Centos 7.7 which is not in the supported list for CDSW 1.6.x. This possibly be causing the issue?
I'll try with the correct os release.
Created 11-19-2019 12:38 PM
Yes @VamshiDevraj . Please use the supported OS as this could cause other issues.
Also as previously said, I am in doubt that your DNS is resolving correctly. Looking at the errors and the hosts file it appears an issue with this.
cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.40.3 master.ezydata.fr master
172.16.40.4 worker1.ezydata.fr worker1
172.16.40.5 worker2.ezydata.fr worker2
172.16.40.6 worker3.ezydata.fr worker3
172.16.40.7 cdsw.ezydata.fr cdws
172.16.40.7 *.cdsw.ezydata.fr
The 7th line is incorrect, short name should be cdsw not cdws.
Created 11-28-2019 09:42 AM
@VamshiDevraj Does this resolves your issue? If yes please mark this as solution.