Member since
07-23-2019
9
Posts
0
Kudos Received
0
Solutions
11-19-2019
08:16 AM
Hi, Observed that my Linux version is Centos 7.7 which is not in the supported list for CDSW 1.6.x. This possibly be causing the issue? I'll try with the correct os release.
... View more
11-18-2019
04:30 AM
https://drive.google.com/file/d/1LdbWohArkQaeYHnyu7VIOcbOciUkbEhR/view?usp=sharing Hi @GangWar, Please find attached the cdsw logs in the link. I've checked that IPv6 is enabled and I believe I have taken care of all the pre-requisites. Any inputs are most welcome. Regards, Vamshi
... View more
11-15-2019
07:26 AM
Hi @GangWar , Many Thanks you for the help. Here is the output of CDSW status & CDSW logs (I'm still getting issues): [centos@cdsw ~]$ sudo cdsw status Sending detailed logs to [/tmp/cdsw_status_DKZbM9.log] ... CDSW Version: [1.6.0.1294376:46715e4] Installed into namespace 'default' OK: Application running as root check OK: NFS service check OK: System process check for CSD install OK: Sysctl params check OK: Kernel memory slabs check Failed to run CDSW Nodes Check. Failed to run CDSW system pods check. Failed to run CDSW application pods check. Failed to run CDSW services check. Failed to run CDSW secrets check. Failed to run CDSW persistent volumes check. Failed to run CDSW persistent volumes claims check. Failed to run CDSW Ingresses check. Checking web at url: http://cdsw.ezydata.fr Web is not yet up. Cloudera Data Science Workbench is not ready yet ------------------------------------------------------------------------------------------- [centos@cdsw ~]$ sudo cdsw logs Generating Cloudera Data Science Workbench diagnostic bundle... Collecting basic system info... Collecting kernel parameters... Collecting kernel messages... Collecting the list of kernel modules... Collecting the list of systemd units... Collecting cdsw details... Collecting application configuration... Collecting disks information... Collecting Hadoop configuration... Collecting network information... Collecting system service statuses... Collecting nfs information... Collecting Docker info... Collecting Kubernetes info... Collecting Helm info... Collecting custom patches... cp: cannot stat ‘/etc/cdsw/patches’: No such file or directory Collecting Kubelet logs... Collecting CDSW Host Controller logs... Collecting system logs... Collecting Kubernetes cluster info dump... ls: cannot access cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info/*/*/logs. txt: No such file or directory Exporting user ids... The connection to the server 172.16.40.7:6443 was refused - did you specify the right host or port? The connection to the server 172.16.40.7:6443 was refused - did you specify the right host or port? error: pod name must be specified Collecting health logs... Redacting logs... find: ‘cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info’: No such file or di rectory find: ‘cdsw-logs-cdsw-2019-11-15--15-21-38/k8s-cluster-info’: No such file or di rectory Producing redacted logs tarball... Logs saved to: cdsw-logs-cdsw-2019-11-15--15-21-38.redacted.tar.gz Cleaning up...
... View more
11-14-2019
01:37 PM
Just an update on this. the DNS issue is now resolved but there are some errors related to kubernetes. Anyone has any pointers for me? [centos@cdsw var]$ sudo cdsw validate [Validating host configuration] > Prechecking OS Version........[OK] > Prechecking kernel Version........[OK] > Prechecking that SELinux is disabled........[OK] > Prechecking scaling limits for processes........[OK] > Prechecking scaling limits for open files........ WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576. It is currently set to [1024] as per 'ulimit -n' Press enter to continue > Loading kernel module [ip_tables]... > Loading kernel module [iptable_nat]... > Loading kernel module [iptable_filter]... > Prechecking that iptables are not configured........[OK] > Prechecking kernel parameters........[OK] > Prechecking to ensure kernel memory accounting disabled:........[OK] > Prechecking Java distribution and version........[OK] > Checking unlimited Java encryption policy for AES........[OK] > Prechecking size of root volume........[OK] [Validating networking setup] > Checking if kubelet iptables rules exist The following chains are missing from iptables: [KUBE-EXTERNAL-SERVICES, WEAVE-NPC-EGRESS, WEAVE-NPC, WEAVE-NPC-EGRESS-ACCEPT, KUBE-SERVICES, WEAVE-NPC-INGRESS, WEAVE-NPC-EGRESS-DEFAULT, WEAVE-NPC-DEFAULT, WEAVE-NPC-EGRESS-CUSTOM] WARNING:: Verification of iptables rules failed: 1 > Checking if DNS server is running on localhost > Checking the number of DNS servers in resolv.conf > Checking DNS entries for CDSW main domain > Checking reverse DNS entries for CDSW main domain > Checking DNS entries for CDSW wildcard domain > Checking that firewalld is disabled [Validating Kubernetes versions] > Checking kubernetes client version > Checking kubernetes server version WARNING:: Kubernetes server is not running, version couldn't be checked.: 1 [Validating NFS and Application Block Device setup] > Checking if nfs or nfs-server is active and enabled > Checking if rpcbind.socket is active and enabled > Checking if rpcbind.service is active and enabled > Checking if the project folder is exported over nfs WARNING:: The projects folder /var/lib/cdsw/current/projects must be exported over nfs: 1 > Checking if application mountpoint exists > Checking if the application directory is on a separate block device WARNING:: The application directory is mounted on the root device.: 1 > Checking the root directory (/) free space > Checking the application directory (/var/lib/cdsw) free space [Validating Kubernetes cluster state] > Checking if we have exactly one master node WARNING:: There must be exactly one Kubernetes node labelled 'stateful=true': 1 > Checking if the Kubernetes nodes are ready > Checking kube-apiserver pod WARNING: Unable to reach k8s pod kube-apiserver. WARNING: [kube-apiserver] pod(s) are not ready under kube-system namespace. WARNING: Unable to bring up kube-apiserver in the kube-system cluster. Skipping other checks.. [Validating CDSW application] > Checking connectivity over ingress WARNING:: Could not curl the application over the ingress controller: 7 -------------------------------------------------------------------------- Errors detected. Please review the issues listed above. Further details can be collected by capturing logs from all nodes using "cdsw logs".
... View more
11-13-2019
09:08 AM
All,
I'm getting errors while installing CDSW in my cluster.
Below is the output of cdsw validate command.
[centos@cdsw ~]$ sudo cdsw validate [Validating host configuration] > Prechecking OS Version........[OK] > Prechecking kernel Version........[OK] > Prechecking that SELinux is disabled........[OK] > Prechecking scaling limits for processes........[OK] > Prechecking scaling limits for open files........ WARNING: Cloudera Data Science Workbench recommends that all users have a max-open-files limit set to 1048576. It is currently set to [1024] as per 'ulimit -n' Press enter to continue > Loading kernel module [ip_tables]... > Loading kernel module [iptable_nat]... > Loading kernel module [iptable_filter]... > Prechecking that iptables are not configured........ WARNING: Cloudera Data Science Workbench requires iptables, but does not support preexisting iptables rules. Press enter to continue > Prechecking kernel parameters........[OK] > Prechecking to ensure kernel memory accounting disabled:........[OK] > Prechecking Java distribution and version........[OK] > Checking unlimited Java encryption policy for AES........[OK] > Prechecking size of root volume........[OK]
[Validating networking setup] > Checking if kubelet iptables rules exist The following chains are missing from iptables: [KUBE-EXTERNAL-SERVICES, WEAVE-NPC-EGRESS, WEAVE-NPC, WEAVE-NPC-EGRESS-ACCEPT, KUBE-SERVICES, WEAVE-NPC-INGRESS, WEAVE-NPC-EGRESS-DEFAULT, WEAVE-NPC-DEFAULT, WEAVE-NPC-EGRESS-CUSTOM, KUBE-FIREWALL] WARNING:: Verification of iptables rules failed: 1 > Checking if DNS server is running on localhost > Checking the number of DNS servers in resolv.conf > Checking DNS entries for CDSW main domain WARNING:: DNS doesn't resolve cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1 > Checking reverse DNS entries for CDSW main domain WARNING:: DNS doesn't resolve cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1 > Checking DNS entries for CDSW wildcard domain WARNING:: DNS doesn't resolve *.cdsw.company.fr to cdsw.company.fr; DNS is not configured properly: 1 > Checking that firewalld is disabled
[Validating Kubernetes versions] > Checking kubernetes client version > Checking kubernetes server version WARNING:: Kubernetes server is not running, version couldn't be checked.: 1
[Validating NFS and Application Block Device setup] > Checking if nfs or nfs-server is active and enabled > Checking if rpcbind.socket is active and enabled > Checking if rpcbind.service is active and enabled > Checking if the project folder is exported over nfs WARNING:: The projects folder /var/lib/cdsw/current/projects must be exported over nfs: 1 > Checking if application mountpoint exists > Checking if the application directory is on a separate block device WARNING:: The application directory is mounted on the root device.: 1 > Checking the root directory (/) free space > Checking the application directory (/var/lib/cdsw) free space
[Validating Kubernetes cluster state] > Checking if we have exactly one master node WARNING:: There must be exactly one Kubernetes node labelled 'stateful=true': 1 > Checking if the Kubernetes nodes are ready > Checking kube-apiserver pod WARNING: Unable to reach k8s pod kube-apiserver. WARNING: [kube-apiserver] pod(s) are not ready under kube-system namespace. WARNING: Unable to bring up kube-apiserver in the kube-system cluster. Skipping other checks..
[Validating CDSW application] > Checking connectivity over ingress WARNING:: Could not curl the application over the ingress controller: 7
-------------------------------------------------------------------------- Errors detected.
Please review the issues listed above. Further details can be collected by capturing logs from all nodes using "cdsw logs".
------------------------------------------------------------------------------------------------------------------------------
Error 1) "DNS is unable to resolve the hostname": i have the following details in the /etc/hosts file (sorry if i've missed something elementary here):
[centos@cdsw ~]$ sudo cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.16.40.3 master.company.fr master 172.16.40.4 worker1.company.fr worker1 172.16.40.5 worker2.company.fr worker2 172.16.40.6 worker3.company.fr worker3 172.16.40.7 cdsw.company.fr cdws 172.16.40.7 *.cdsw.company.fr
Error 2) "Unable to reach k8s pod kube-apiserver"
Does anyone know what is causing this issue?
Many thanks,
... View more
Labels:
11-12-2019
03:36 PM
Hi @bgooley , I see your point. I realized now that the JDK on the gateway node is 1.6 while all other nodes are on JDK1.8. This indeed could be causing the problem. I'll first upgrade to jdk1.8 and attempt later....thank you so much. Regards, Vamshi
... View more
11-12-2019
02:42 PM
[centos@cdws user]$ hostname -f cdws.company.fr [centos@cdws user]$ kinit Password for centos@COMPANY.COM: [centos@cdws user]$ klist Ticket cache: FILE:/tmp/krb5cc_1000 Default principal: centos@COMPANY.COM Valid starting Expires Service principal 11/12/2019 21:13:35 11/13/2019 21:13:35 krbtgt/COMPANY.COM@COMPANY.COM Below is the HDFS command [centos@cdws user]$ hdfs dfs -ls /user/centos 19/11/12 22:29:28 WARN security.UserGroupInformation: PriviledgedActionException as:centos (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 19/11/12 22:29:28 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 19/11/12 22:29:28 WARN security.UserGroupInformation: PriviledgedActionException as:centos (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ls: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "cdws.company.fr/172.16.40.7"; destination host is: "master.company.fr":8020; [centos@cdws user]$ sudo yum list installed | grep krb5 krb5-devel.x86_64 1.15.1-37.el7_7.2 @updates krb5-libs.x86_64 1.15.1-37.el7_7.2 @updates krb5-workstation.x86_64 1.15.1-37.el7_7.2 @updates [centos@cdws user]$ sudo cat /etc/krb5.conf [libdefaults] default_realm = COMPANY.COM dns_lookup_kdc = false dns_lookup_realm = false ticket_lifetime = 86400 renew_lifetime = 604800 forwardable = true default_tgs_enctypes = aes256-cts-hmac-sha1-96 default_tkt_enctypes = aes256-cts-hmac-sha1-96 permitted_enctypes = aes256-cts-hmac-sha1-96 udp_preference_limit = 1 kdc_timeout = 3000 [realms] COMPANY.COM = { kdc = worker3.company.fr admin_server = worker3.company.fr } [domain_realm] ------------------------------------------------------------------------ However, one thing worth noting is i don't see any principle for "cdws.company.fr" in kerberos credential tab in CM. Is this causing the issue and how could I resolve it? Many thanks,
... View more
11-12-2019
01:14 PM
All,
I have a HDFS/Spark/YARN gateway node that has been assigned the roles via cloudera manager. However, Kerberos is preventing me from accessing HDFS from the gateway node. I could generate TGT via kinit command. I could access HDFS from other nodes in the hadoop cluster using the same user id and same steps.
All kerberos configuration files in the gateway node seems to be in place. Any suggestions what i could have missed?
... View more
Labels:
- Labels:
-
Apache Hadoop