Support Questions
Find answers, ask questions, and share your expertise

Unable to access HDFS from CDSW session

Highlighted

Re: Unable to access HDFS from CDSW session

Master Collaborator

@Marek I think it's definitely network issue now. 

Node IP: "Public-IP-Address" not found in the host's network interfaces

This message would indicate to me that the ip address of the host machine has changed or not at least above IP at network interface level of this host. 

This thread is talked about the issue: https://github.com/kubernetes/kubernetes/issues/54337

The architecture which you are using is not supported, you might be able to hack thing using discussed in the thread:

Using --hostname-override=external-ip arguments for kubelet

 but not a long term solution. So you have to revise the network architecture is what I personally recommend to you as CDSW is little sensitive about this. 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Highlighted

Re: Unable to access HDFS from CDSW session

Explorer

@GangWar Have changed the kubelet parameter in /opt/cloudera/parcels/CDSW/scripts/start-kubelet-master-standalone-core.sh as suggested:

#kubelet_opts+=(--hostname-override=${master_hostname_lower})
kubelet_opts+=(--hostname-override=external-ip)

 Unfortunately the pods (kube-apiserver, kube-scheduler, etcd) keep crashing/exiting.

Highlighted

Re: Unable to access HDFS from CDSW session

Master Collaborator
What is the status of process logs? Are you able to see successful
registration of hosts?

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Highlighted

Re: Unable to access HDFS from CDSW session

Explorer

I do not see any successful host registrations. Please see below the tail of the process logs.

[root@cdsw-master-01 ~]# tail 10 /var/run/cloudera-scm-agent/process/19{09..11}*/logs/stderr.log
==> /var/run/cloudera-scm-agent/process/1909-cdsw-CDSW_DOCKER/logs/stderr.log <==
time="2020-09-03T07:00:55.018288801Z" level=error msg="Handler for GET /containers/12437b8b7b3b452bc7bfe8a3a26fe253de38601b7dd5093bd3d67a8f52b50e6b/json returned error: write unix /var/run/docker.sock->@: write: broken pipe" 
2020-09-03 07:00:55.018357 I | http: multiple response.WriteHeader calls
time="2020-09-03T07:01:12.350659606Z" level=info msg="stopping containerd after receiving terminated" 
time="2020-09-03T07:01:12.351645251Z" level=info msg="Processing signal 'terminated'" 
time="2020-09-03T07:01:12.352045287Z" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing" 
time="2020-09-03T07:01:13.187239486Z" level=info msg="libcontainerd: new containerd process, pid: 9176" 
time="2020-09-03T07:01:13.206461276Z" level=error msg="containerd: notify OOM events" error="open /proc/8671/cgroup: no such file or directory" 
time="2020-09-03T07:01:13.206730882Z" level=error msg="containerd: notify OOM events" error="open /proc/8808/cgroup: no such file or directory" 
time="2020-09-03T07:01:13.206985589Z" level=error msg="containerd: notify OOM events" error="open /proc/8995/cgroup: no such file or directory" 
time="2020-09-03T07:01:13.904988075Z" level=info msg="stopping containerd after receiving terminated" 

==> /var/run/cloudera-scm-agent/process/1910-cdsw-CDSW_MASTER/logs/stderr.log <==
E0903 07:00:54.262100   31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.362293   31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.462458   31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.480206   31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.210.200:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dexternal-ip&limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused
E0903 07:00:54.480889   31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://10.133.210.200:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused
E0903 07:00:54.481951   31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://10.133.210.200:6443/api/v1/nodes?fieldSelector=metadata.name%3Dexternal-ip&limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused
E0903 07:00:54.562631   31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.662826   31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.763006   31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.863203   31064 kubelet.go:2266] node "external-ip" not found

==> /var/run/cloudera-scm-agent/process/1911-cdsw-CDSW_APPLICATION/logs/stderr.log <==
    func(*targs, **kargs)
  File "/opt/cloudera/parcels/CDSW-1.7.2.p1.2066404/cdsw_admin/cdsw/admin.py", line 63, in stop
    os.killpg(os.getpid(), signal.SIGKILL)
OSError: [Errno 3] No such process
+ is_kubelet_process_up
+ is_kube_cluster_configured
+ '[' -e /etc/kubernetes/admin.conf ']'
+ return 0
++ KUBECONFIG=/etc/kubernetes/kubelet.conf
++ /opt/cloudera/parcels/CDSW-1.7.2.p1.2066404/kubernetes/bin/kubectl get nodes
Highlighted

Re: Unable to access HDFS from CDSW session

Explorer

@GangWar The problem with crashing/exiting pods is now fixed. After the CDSW master host restoration by mistake I provisioned it's MASTER_IP in CM config as the one resolved by DNS from CDSW FQDN, however it should be the host's private IP address within Cloudera cluster. Hence the intermediate problem is solved.

 

Let me then kindly ask for further assistance in troubleshooting the original issue with the HDFS access from CDSW sessions.