Support Questions

Find answers, ask questions, and share your expertise

CDSW installation issue

avatar

All,

I'm getting errors while installing CDSW in my cluster.

Below is the output of cdsw validate command.

 

[centos@cdsw ~]$ sudo cdsw validate
[Validating host configuration]
> Prechecking OS Version........[OK]
> Prechecking kernel Version........[OK]
> Prechecking that SELinux is disabled........[OK]
> Prechecking scaling limits for processes........[OK]
> Prechecking scaling limits for open files........
WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.
It is currently set to [1024] as per 'ulimit -n'
Press enter to continue
> Loading kernel module [ip_tables]...
> Loading kernel module [iptable_nat]...
> Loading kernel module [iptable_filter]...
> Prechecking that iptables are not configured........
WARNING: Cloudera Data Science Workbench requires iptables, but does not support preexisting iptables rules. Press enter to continue
> Prechecking kernel parameters........[OK]
> Prechecking to ensure kernel memory accounting disabled:........[OK]
> Prechecking Java distribution and version........[OK]
> Checking unlimited Java encryption policy for AES........[OK]
> Prechecking size of root volume........[OK]

[Validating networking setup]
> Checking if kubelet iptables rules exist
The following chains are missing from iptables: [KUBE-EXTERNAL-SERVICES, WEAVE-NPC-EGRESS, WEAVE-NPC, WEAVE-NPC-EGRESS-ACCEPT, KUBE-SERVICES, WEAVE-NPC-INGRESS, WEAVE-NPC-EGRESS-DEFAULT, WEAVE-NPC-DEFAULT, WEAVE-NPC-EGRESS-CUSTOM, KUBE-FIREWALL]
WARNING:: Verification of iptables rules failed: 1
> Checking if DNS server is running on localhost
> Checking the number of DNS servers in resolv.conf
> Checking DNS entries for CDSW main domain
WARNING:: DNS doesn't resolve cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1
> Checking reverse DNS entries for CDSW main domain
WARNING:: DNS doesn't resolve cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1
> Checking DNS entries for CDSW wildcard domain
WARNING:: DNS doesn't resolve *.cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1
> Checking that firewalld is disabled

[Validating Kubernetes versions]
> Checking kubernetes client version
> Checking kubernetes server version
WARNING:: Kubernetes server is not running, version couldn't be checked.: 1

[Validating NFS and Application Block Device setup]
> Checking if nfs or nfs-server is active and enabled
> Checking if rpcbind.socket is active and enabled
> Checking if rpcbind.service is active and enabled
> Checking if the project folder is exported over nfs
WARNING:: The projects folder /var/lib/cdsw/current/projects must be exported over nfs: 1
> Checking if application mountpoint exists
> Checking if the application directory is on a separate block device
WARNING:: The application directory is mounted on the root device.: 1
> Checking the root directory (/) free space
> Checking the application directory (/var/lib/cdsw) free space

[Validating Kubernetes cluster state]
> Checking if we have exactly one master node
WARNING:: There must be exactly one Kubernetes node labelled 'stateful=true': 1
> Checking if the Kubernetes nodes are ready
> Checking kube-apiserver pod
WARNING: Unable to reach k8s pod kube-apiserver.
WARNING: [kube-apiserver] pod(s) are not ready under kube-system namespace.
WARNING: Unable to bring up kube-apiserver in the kube-system cluster. Skipping other checks..

[Validating CDSW application]
> Checking connectivity over ingress
WARNING:: Could not curl the application over the ingress controller: 7

--------------------------------------------------------------------------
Errors detected.

Please review the issues listed above. Further details can be collected by
capturing logs from all nodes using "cdsw logs".

------------------------------------------------------------------------------------------------------------------------------

 

 

Error 1) "DNS is unable to resolve the hostname": i have the following details in the /etc/hosts file (sorry if i've missed something elementary here):

[centos@cdsw ~]$ sudo cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.40.3 master.company.fr master
172.16.40.4 worker1.company.fr worker1
172.16.40.5 worker2.company.fr worker2
172.16.40.6 worker3.company.fr worker3
172.16.40.7 cdsw.company.fr cdws
172.16.40.7 *.cdsw.company.fr

 

Error 2) "Unable to reach k8s pod kube-apiserver"

Does anyone know what is causing this issue?

 

Many thanks, 

 

8 REPLIES 8

avatar

Just an update on this. the DNS issue is now resolved but there are some errors related to kubernetes. Anyone has any pointers for me?

 

[centos@cdsw var]$ sudo cdsw validate
[Validating host configuration]
> Prechecking OS Version........[OK]
> Prechecking kernel Version........[OK]
> Prechecking that SELinux is disabled........[OK]
> Prechecking scaling limits for processes........[OK]
> Prechecking scaling limits for open files........
WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576.
It is currently set to [1024] as per 'ulimit -n'
Press enter to continue
> Loading kernel module [ip_tables]...
> Loading kernel module [iptable_nat]...
> Loading kernel module [iptable_filter]...
> Prechecking that iptables are not configured........[OK]
> Prechecking kernel parameters........[OK]
> Prechecking to ensure kernel memory accounting disabled:........[OK]
> Prechecking Java distribution and version........[OK]
> Checking unlimited Java encryption policy for AES........[OK]
> Prechecking size of root volume........[OK]

[Validating networking setup]
> Checking if kubelet iptables rules exist
The following chains are missing from iptables: [KUBE-EXTERNAL-SERVICES, WEAVE-NPC-EGRESS, WEAVE-NPC, WEAVE-NPC-EGRESS-ACCEPT, KUBE-SERVICES, WEAVE-NPC-INGRESS, WEAVE-NPC-EGRESS-DEFAULT, WEAVE-NPC-DEFAULT, WEAVE-NPC-EGRESS-CUSTOM]
WARNING:: Verification of iptables rules failed: 1
> Checking if DNS server is running on localhost
> Checking the number of DNS servers in resolv.conf
> Checking DNS entries for CDSW main domain
> Checking reverse DNS entries for CDSW main domain
> Checking DNS entries for CDSW wildcard domain
> Checking that firewalld is disabled

[Validating Kubernetes versions]
> Checking kubernetes client version
> Checking kubernetes server version
WARNING:: Kubernetes server is not running, version couldn't be checked.: 1

[Validating NFS and Application Block Device setup]
> Checking if nfs or nfs-server is active and enabled
> Checking if rpcbind.socket is active and enabled
> Checking if rpcbind.service is active and enabled
> Checking if the project folder is exported over nfs
WARNING:: The projects folder /var/lib/cdsw/current/projects must be exported over nfs: 1
> Checking if application mountpoint exists
> Checking if the application directory is on a separate block device
WARNING:: The application directory is mounted on the root device.: 1
> Checking the root directory (/) free space
> Checking the application directory (/var/lib/cdsw) free space

[Validating Kubernetes cluster state]
> Checking if we have exactly one master node
WARNING:: There must be exactly one Kubernetes node labelled 'stateful=true': 1
> Checking if the Kubernetes nodes are ready
> Checking kube-apiserver pod
WARNING: Unable to reach k8s pod kube-apiserver.
WARNING: [kube-apiserver] pod(s) are not ready under kube-system namespace.
WARNING: Unable to bring up kube-apiserver in the kube-system cluster. Skipping other checks..

[Validating CDSW application]
> Checking connectivity over ingress
WARNING:: Could not curl the application over the ingress controller: 7

--------------------------------------------------------------------------
Errors detected.

Please review the issues listed above. Further details can be collected by
capturing logs from all nodes using "cdsw logs".

 

avatar
Master Guru

@VamshiDevraj please remove the old IP tables from the host by running below command.

1. Stop CDSW service.
2. Run below commands on master host.
sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -F
3. Start Docker role on master node, once complete than start Master role on the master node and then Application role on Master node.

Run cdsw status command and send the output here. It will be good if you can send the cdsw logs output from the node.


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar

Hi @GangWar ,

 

Many Thanks you for the help. Here is the output of CDSW status & CDSW logs (I'm still getting issues):

[centos@cdsw ~]$ sudo cdsw status
Sending detailed logs to [/tmp/cdsw_status_DKZbM9.log] ...
CDSW Version: [1.6.0.1294376:46715e4]
Installed into namespace 'default'
OK: Application running as root check
OK: NFS service check
OK: System process check for CSD install
OK: Sysctl params check
OK: Kernel memory slabs check
Failed to run CDSW Nodes Check.
Failed to run CDSW system pods check.
Failed to run CDSW application pods check.
Failed to run CDSW services check.
Failed to run CDSW secrets check.
Failed to run CDSW persistent volumes check.
Failed to run CDSW persistent volumes claims check.
Failed to run CDSW Ingresses check.
Checking web at url: http://cdsw.ezydata.fr
Web is not yet up.
Cloudera Data Science Workbench is not ready yet

-------------------------------------------------------------------------------------------

 

[centos@cdsw ~]$ sudo cdsw logs
Generating Cloudera Data Science Workbench diagnostic bundle...
Collecting basic system info...
Collecting kernel parameters...
Collecting kernel messages...
Collecting the list of kernel modules...
Collecting the list of systemd units...
Collecting cdsw details...
Collecting application configuration...
Collecting disks information...
Collecting Hadoop configuration...
Collecting network information...
Collecting system service statuses...
Collecting nfs information...
Collecting Docker info...
Collecting Kubernetes info...
Collecting Helm info...
Collecting custom patches...
cp: cannot stat ‘/etc/cdsw/patches’: No such file or directory
Collecting Kubelet logs...
Collecting CDSW Host Controller logs...
Collecting system logs...
Collecting Kubernetes cluster info dump...
ls: cannot access cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info/*/*/logs. txt: No such file or directory
Exporting user ids...
The connection to the server 172.16.40.7:6443 was refused - did you specify the right host or port?
The connection to the server 172.16.40.7:6443 was refused - did you specify the right host or port?
error: pod name must be specified
Collecting health logs...
Redacting logs...
find: ‘cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info’: No such file or di rectory
find: ‘cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info’: No such file or di rectory

Producing redacted logs tarball...
Logs saved to: cdsw-logs-cdsw-2019-11-15--15-21-38.redacted.tar.gz
Cleaning up...

 

avatar
Master Guru

The output is truncated. Can you upload the full output of cdsw status or you can just attach the file 

cdsw-logs-cdsw-2019-11-15--15-21-38.redacted.tar.gz here. So that I can see the logs. 

 

Also please make sure you have followed all the prerequisite from here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html#pre_in...

Make sure you have IPv6 enabled in the CDSW host. Take a look of known issue here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_known_issues.html#k...

 

 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar

https://drive.google.com/file/d/1LdbWohArkQaeYHnyu7VIOcbOciUkbEhR/view?usp=sharing

 

Hi @GangWar

Please find attached the cdsw logs in the link. 

I've checked that IPv6 is enabled and I believe I have taken care of all the pre-requisites.

 

Any inputs are most welcome.

Regards,

Vamshi

avatar

Hi,

Observed that my Linux version is Centos 7.7 which is not in the supported list for CDSW 1.6.x. This possibly be causing the issue?

I'll try with the correct os release.

 

 

 

avatar
Master Guru

Yes @VamshiDevraj . Please use the supported OS as this could cause other issues. 

Also as previously said, I am in doubt that your DNS is resolving correctly. Looking at the errors and the hosts file it appears an issue with this.

cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.40.3 master.ezydata.fr master
172.16.40.4 worker1.ezydata.fr worker1
172.16.40.5 worker2.ezydata.fr worker2
172.16.40.6 worker3.ezydata.fr worker3
172.16.40.7 cdsw.ezydata.fr cdws
172.16.40.7 *.cdsw.ezydata.fr

The 7th line is incorrect, short name should be cdsw not cdws.

  • Please correct this and follow the steps to remove IP tables. 
  • Check the DNS resolution working correctly.
  • If DNS works, then got to CM and run prepare node first.
  • Start CDSW service.

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Master Guru

@VamshiDevraj Does this resolves your issue? If yes please mark this as solution.


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.