About Marek

Marek · ‎02-11-2022

May I ask for an advice, where to post a question to the account team to e.g. change the profile email address (within the same company) which does not exist anymore (the domain name has changed)? Thx.

Marek · ‎09-03-2020

@GangWar The problem with crashing/exiting pods is now fixed. After the CDSW master host restoration by mistake I provisioned it's MASTER_IP in CM config as the one resolved by DNS from CDSW FQDN, however it should be the host's private IP address within Cloudera cluster. Hence the intermediate problem is solved. Let me then kindly ask for further assistance in troubleshooting the original issue with the HDFS access from CDSW sessions.

Marek · ‎09-03-2020

I do not see any successful host registrations. Please see below the tail of the process logs. [root@cdsw-master-01 ~]# tail 10 /var/run/cloudera-scm-agent/process/19{09..11}*/logs/stderr.log ==> /var/run/cloudera-scm-agent/process/1909-cdsw-CDSW_DOCKER/logs/stderr.log <== time="2020-09-03T07:00:55.018288801Z" level=error msg="Handler for GET /containers/12437b8b7b3b452bc7bfe8a3a26fe253de38601b7dd5093bd3d67a8f52b50e6b/json returned error: write unix /var/run/docker.sock->@: write: broken pipe" 2020-09-03 07:00:55.018357 I | http: multiple response.WriteHeader calls time="2020-09-03T07:01:12.350659606Z" level=info msg="stopping containerd after receiving terminated" time="2020-09-03T07:01:12.351645251Z" level=info msg="Processing signal 'terminated'" time="2020-09-03T07:01:12.352045287Z" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing" time="2020-09-03T07:01:13.187239486Z" level=info msg="libcontainerd: new containerd process, pid: 9176" time="2020-09-03T07:01:13.206461276Z" level=error msg="containerd: notify OOM events" error="open /proc/8671/cgroup: no such file or directory" time="2020-09-03T07:01:13.206730882Z" level=error msg="containerd: notify OOM events" error="open /proc/8808/cgroup: no such file or directory" time="2020-09-03T07:01:13.206985589Z" level=error msg="containerd: notify OOM events" error="open /proc/8995/cgroup: no such file or directory" time="2020-09-03T07:01:13.904988075Z" level=info msg="stopping containerd after receiving terminated" ==> /var/run/cloudera-scm-agent/process/1910-cdsw-CDSW_MASTER/logs/stderr.log <== E0903 07:00:54.262100 31064 kubelet.go:2266] node "external-ip" not found E0903 07:00:54.362293 31064 kubelet.go:2266] node "external-ip" not found E0903 07:00:54.462458 31064 kubelet.go:2266] node "external-ip" not found E0903 07:00:54.480206 31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.210.200:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dexternal-ip&limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused E0903 07:00:54.480889 31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://10.133.210.200:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused E0903 07:00:54.481951 31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://10.133.210.200:6443/api/v1/nodes?fieldSelector=metadata.name%3Dexternal-ip&limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused E0903 07:00:54.562631 31064 kubelet.go:2266] node "external-ip" not found E0903 07:00:54.662826 31064 kubelet.go:2266] node "external-ip" not found E0903 07:00:54.763006 31064 kubelet.go:2266] node "external-ip" not found E0903 07:00:54.863203 31064 kubelet.go:2266] node "external-ip" not found ==> /var/run/cloudera-scm-agent/process/1911-cdsw-CDSW_APPLICATION/logs/stderr.log <== func(*targs, **kargs) File "/opt/cloudera/parcels/CDSW-1.7.2.p1.2066404/cdsw_admin/cdsw/admin.py", line 63, in stop os.killpg(os.getpid(), signal.SIGKILL) OSError: [Errno 3] No such process + is_kubelet_process_up + is_kube_cluster_configured + '[' -e /etc/kubernetes/admin.conf ']' + return 0 ++ KUBECONFIG=/etc/kubernetes/kubelet.conf ++ /opt/cloudera/parcels/CDSW-1.7.2.p1.2066404/kubernetes/bin/kubectl get nodes

Marek · ‎09-02-2020

@GangWar Have changed the kubelet parameter in /opt/cloudera/parcels/CDSW/scripts/start-kubelet-master-standalone-core.sh as suggested: #kubelet_opts+=(--hostname-override=${master_hostname_lower}) kubelet_opts+=(--hostname-override=external-ip) Unfortunately the pods (kube-apiserver, kube-scheduler, etcd) keep crashing/exiting.

Marek · ‎09-02-2020

Have performed some further troubleshooting. As per the CDSW master-docker process stderr.log there might be a problem with Kubernetes DNS resolution due to missing weave containers for pod networking. Indeed, the DNS lookup cannot find/resolve one of the container repository's FQDN – docker-registry.infra.cloudera.com, which is supposed to hold the weave containers. Are you in a position to verify and confirm, if that is the root cause?

Marek · ‎08-24-2020

@GangWar Followed the steps jointly with a Cloudera representative (Kamel D). Unfortunately the problem is still there – several containers keep exiting.

Marek · ‎08-17-2020

@GangWar Which command should I run manually from terminal, at what cluster hosts, and at which point in the overall procedure of adding CDSW service to the cluster? Nonetheless, have removed the CDSW roles and host from the cluster and Cloudera Manager, created another clean VM, adjusted its config to meet the requirements, and added back CDSW service and its roles on the new host. Unfortunately the CDSW service reports the same errors as before and the web GUI is not accessible. The docker-thinpool logical volume has been created successfully, however the containers keep crashing/exiting:

Marek · ‎08-12-2020

@GangWar I am confused – earlier you wrote that CDSW does not care about /etc/hosts file, and now that the short names should be declared in /etc/hosts file. What is the right statement? Notwithstanding, if the CDSW hosts are managed by Cloudera Manager, shouldn't the latter take care about the relevant configuration of all the cluster hosts? In other words, if the CDH hosts in the cluster communicate correctly with the HDFS name nodes based on the hdfs-site.xml config file, then why the CDSW hosts don't? Nevertheless, unfortunately the CDSW master host crashed and I was unable to restore it through Cloudera Manager. Tried to solve it by removing CDSW service from cluster, removing the CDSW host completely from cluster, destroying and creating a new VM for CDSW master, redeploying on it the requirements, adding back to CM and cluster. However now the problem is with adding CDSW service back to the cluster – the procedure gets stuck at running /opt/cloudera/parcels/CDSW/scripts/create-docker-thinpool.sh. It hangs at command: lvcreate --wipesignatures y -n thinpool docker -l 95%VG The procedure to add CDSW service continues and completes only if I terminate manually in CLI the aforementioned hanging lvcreate process (kill -2 <pid>). However the Docker Daemon service seems to malfunction as several service pods do not come up, incl. the CDSW web GUI.

Marek · ‎08-03-2020

@GangWar In which Oozie's service configuration item in Cloudera Manager this should be defined?

Marek · ‎08-03-2020

Actually on a clean CentOS 7.6 a simple pip install numpy does not work – the command returns RuntimeError: Python version >= 3.6 required. Had to upgrade pip first, change default permission mask (if installed system wide by root, otherwise the installed numpy package is not readable by non-root users), and only then install numpy: # pip install --upgrade pip Collecting pip [...] # umask 022; pip install numpy Nonetheless this workaround is not scalable (it should be managed/solved cluster wise from Cloudera Manager, not command line), and in contrast to Python/pip best practices (as pip should not be used for system wide (root) package installations). Hence still looking for a solution, how to make the PySpark script to use the Anaconda Python on the cluster nodes.

Online	Offline
Last Visited	‎02-15-2022 09:33 AM

Member Since	‎05-07-2020 02:55 AM
Last Visited	‎02-15-2022 09:33 AM
Posts	32

Cloudera Community

Re: CDSW service unavailable for adding to a CDH c...

Re: Change company name on my account

Re: Unable to access HDFS from CDSW session

Re: Unable to access HDFS from CDSW session

Re: Unable to access HDFS from CDSW session

Re: Unable to access HDFS from CDSW session

Re: Unable to access HDFS from CDSW session

Re: Unable to access HDFS from CDSW session

Re: Unable to access HDFS from CDSW session

Re: Jupyter notebook > ImportError: No module name...

Re: Jupyter notebook > ImportError: No module name...