Member since
05-07-2020
31
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
345 | 06-23-2020 01:13 AM |
09-03-2020
06:22 AM
@GangWar The problem with crashing/exiting pods is now fixed. After the CDSW master host restoration by mistake I provisioned it's MASTER_IP in CM config as the one resolved by DNS from CDSW FQDN, however it should be the host's private IP address within Cloudera cluster. Hence the intermediate problem is solved. Let me then kindly ask for further assistance in troubleshooting the original issue with the HDFS access from CDSW sessions.
... View more
09-03-2020
12:33 AM
I do not see any successful host registrations. Please see below the tail of the process logs. [root@cdsw-master-01 ~]# tail 10 /var/run/cloudera-scm-agent/process/19{09..11}*/logs/stderr.log
==> /var/run/cloudera-scm-agent/process/1909-cdsw-CDSW_DOCKER/logs/stderr.log <==
time="2020-09-03T07:00:55.018288801Z" level=error msg="Handler for GET /containers/12437b8b7b3b452bc7bfe8a3a26fe253de38601b7dd5093bd3d67a8f52b50e6b/json returned error: write unix /var/run/docker.sock->@: write: broken pipe"
2020-09-03 07:00:55.018357 I | http: multiple response.WriteHeader calls
time="2020-09-03T07:01:12.350659606Z" level=info msg="stopping containerd after receiving terminated"
time="2020-09-03T07:01:12.351645251Z" level=info msg="Processing signal 'terminated'"
time="2020-09-03T07:01:12.352045287Z" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
time="2020-09-03T07:01:13.187239486Z" level=info msg="libcontainerd: new containerd process, pid: 9176"
time="2020-09-03T07:01:13.206461276Z" level=error msg="containerd: notify OOM events" error="open /proc/8671/cgroup: no such file or directory"
time="2020-09-03T07:01:13.206730882Z" level=error msg="containerd: notify OOM events" error="open /proc/8808/cgroup: no such file or directory"
time="2020-09-03T07:01:13.206985589Z" level=error msg="containerd: notify OOM events" error="open /proc/8995/cgroup: no such file or directory"
time="2020-09-03T07:01:13.904988075Z" level=info msg="stopping containerd after receiving terminated"
==> /var/run/cloudera-scm-agent/process/1910-cdsw-CDSW_MASTER/logs/stderr.log <==
E0903 07:00:54.262100 31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.362293 31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.462458 31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.480206 31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.133.210.200:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dexternal-ip&limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused
E0903 07:00:54.480889 31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://10.133.210.200:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused
E0903 07:00:54.481951 31064 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://10.133.210.200:6443/api/v1/nodes?fieldSelector=metadata.name%3Dexternal-ip&limit=500&resourceVersion=0: dial tcp 10.133.210.200:6443: connect: connection refused
E0903 07:00:54.562631 31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.662826 31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.763006 31064 kubelet.go:2266] node "external-ip" not found
E0903 07:00:54.863203 31064 kubelet.go:2266] node "external-ip" not found
==> /var/run/cloudera-scm-agent/process/1911-cdsw-CDSW_APPLICATION/logs/stderr.log <==
func(*targs, **kargs)
File "/opt/cloudera/parcels/CDSW-1.7.2.p1.2066404/cdsw_admin/cdsw/admin.py", line 63, in stop
os.killpg(os.getpid(), signal.SIGKILL)
OSError: [Errno 3] No such process
+ is_kubelet_process_up
+ is_kube_cluster_configured
+ '[' -e /etc/kubernetes/admin.conf ']'
+ return 0
++ KUBECONFIG=/etc/kubernetes/kubelet.conf
++ /opt/cloudera/parcels/CDSW-1.7.2.p1.2066404/kubernetes/bin/kubectl get nodes
... View more
09-02-2020
11:54 PM
@GangWar Have changed the kubelet parameter in /opt/cloudera/parcels/CDSW/scripts/start-kubelet-master-standalone-core.sh as suggested: #kubelet_opts+=(--hostname-override=${master_hostname_lower}) kubelet_opts+=(--hostname-override=external-ip) Unfortunately the pods (kube-apiserver, kube-scheduler, etcd) keep crashing/exiting.
... View more
09-02-2020
07:22 AM
Have performed some further troubleshooting. As per the CDSW master-docker process stderr.log there might be a problem with Kubernetes DNS resolution due to missing weave containers for pod networking. Indeed, the DNS lookup cannot find/resolve one of the container repository's FQDN – docker-registry.infra.cloudera.com, which is supposed to hold the weave containers. Are you in a position to verify and confirm, if that is the root cause?
... View more
08-24-2020
01:59 AM
@GangWar Followed the steps jointly with a Cloudera representative (Kamel D). Unfortunately the problem is still there – several containers keep exiting.
... View more
08-17-2020
01:17 AM
@GangWar Which command should I run manually from terminal, at what cluster hosts, and at which point in the overall procedure of adding CDSW service to the cluster? Nonetheless, have removed the CDSW roles and host from the cluster and Cloudera Manager, created another clean VM, adjusted its config to meet the requirements, and added back CDSW service and its roles on the new host. Unfortunately the CDSW service reports the same errors as before and the web GUI is not accessible. The docker-thinpool logical volume has been created successfully, however the containers keep crashing/exiting:
... View more
08-12-2020
01:09 AM
@GangWar I am confused – earlier you wrote that CDSW does not care about /etc/hosts file, and now that the short names should be declared in /etc/hosts file. What is the right statement? Notwithstanding, if the CDSW hosts are managed by Cloudera Manager, shouldn't the latter take care about the relevant configuration of all the cluster hosts? In other words, if the CDH hosts in the cluster communicate correctly with the HDFS name nodes based on the hdfs-site.xml config file, then why the CDSW hosts don't? Nevertheless, unfortunately the CDSW master host crashed and I was unable to restore it through Cloudera Manager. Tried to solve it by removing CDSW service from cluster, removing the CDSW host completely from cluster, destroying and creating a new VM for CDSW master, redeploying on it the requirements, adding back to CM and cluster. However n ow the problem is with adding CDSW service back to the cluster – the procedure gets stuck at running /opt/cloudera/parcels/CDSW/scripts/create-docker-thinpool.sh. It hangs at command: lvcreate --wipesignatures y -n thinpool docker -l 95%VG The procedure to add CDSW service continues and completes only if I terminate manually in CLI the aforementioned hanging lvcreate process (kill -2 <pid>). However the Docker Daemon service seems to malfunction as several service pods do not come up, incl. the CDSW web GUI.
... View more
08-03-2020
02:21 AM
@GangWar In which Oozie's service configuration item in Cloudera Manager this should be defined?
... View more
08-03-2020
01:31 AM
Actually on a clean CentOS 7.6 a simple pip install numpy does not work – the command returns RuntimeError : Python version >= 3.6 required . Had to upgrade pip first, change default permission mask (if installed system wide by root, otherwise the installed numpy package is not readable by non-root users), and only then install numpy: # pip install --upgrade pip
Collecting pip
[...]
# umask 022; pip install numpy Nonetheless this workaround is not scalable (it should be managed/solved cluster wise from Cloudera Manager, not command line), and in contrast to Python/pip best practices (as pip should not be used for system wide (root) package installations). Hence still looking for a solution, how to make the PySpark script to use the Anaconda Python on the cluster nodes.
... View more
07-31-2020
07:14 AM
Have seen other topics with the same or similar subject name, in particular this one. Followed the hints, however they do not solve my problem, or it is unclear how to implement a solution. Hence let me create this alternate topic.
In a CDH 6.3.2 cluster have an Anaconda parcel distributed and activated, which of course has the numpy module installed. However the Spark nodes seem to ignore the CDH configuration and keep using the system wide Python from /usr/bin/python.
Nevertheless I have installed numpy in system wide Python across all cluster nodes. However I still experience the " ImportError: No module named numpy".
Would appreciate any further advice how to solve the problem.
Also not sure how to implement the solution referred in https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie. Any clarification much appreciated.
Here is the error extracted from a Jupyter notebook output:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure:
Aborting TaskSet 1.0 because task 0 (partition 0)
cannot run anywhere due to node and executor blacklist.
Most recent failure:
Lost task 0.0 in stage 1.0 (TID 1, blc-worker-03.novalocal, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/worker.py", line 359, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/worker.py", line 64, in read_command
command = serializer._read_with_length(file)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/serializers.py", line 172, in _read_with_length
return self.loads(obj)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/serializers.py", line 580, in loads
return pickle.loads(obj)
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/pyspark/mllib/__init__.py", line 28, in <module>
import numpy
ImportError: No module named numpy
... View more
07-31-2020
05:54 AM
In a CDH 6.3.2 cluster have an Anaconda parcel distributed and activated, which of course has the numpy module installed. However the Spark nodes seem to ignore the CDH configuration and keep using the system wide Python from /usr/bin/python. Nevertheless I have installed numpy in system wide Python across all cluster nodes. However I still experience the " ImportError: No module named numpy". Would appreciate any further advice how to solve the problem. Not sure how to implement the solution referred in https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie.
... View more
07-31-2020
05:48 AM
@kernel8liang Could you please explain how to implement the solution?
... View more
07-28-2020
04:32 AM
@GangWar Please see the CDSW session command log and the hdfs-site.xml file contents enclosed. !echo $PATH
/usr/lib/jvm/jre-openjdk/bin:/home/cdsw/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin:/opt/cloudera/parcels/CDH/bin:/home/cdsw/.conda/envs/python3.6/bin
!which hdfs
/opt/cloudera/parcels/CDH/bin/hdfs
!/opt/cloudera/parcels/CDH/bin/hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:47","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:47","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:47","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "blc-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over blc-control-03.novalocal:8020 after 1 failover attempts. Trying to failover after sleeping for 1424ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/28 11:28:49","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "blc-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over blc-control-02.novalocal:8020 after 2 failover attempts. Trying to failover after sleeping for 2662ms."}} <?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>namenodeHA</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.namenodeHA</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.namenodeHA</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>blc-control-01.novalocal:2181,blc-control-02.novalocal:2181,blc-control-03.novalocal:2181</value>
</property>
<property>
<name>dfs.ha.namenodes.namenodeHA</name>
<value>namenode43,namenode57</value>
</property>
<property>
<name>dfs.namenode.rpc-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:8022</value>
</property>
<property>
<name>dfs.namenode.http-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:9870</value>
</property>
<property>
<name>dfs.namenode.https-address.namenodeHA.namenode43</name>
<value>blc-control-02.novalocal:9871</value>
</property>
<property>
<name>dfs.namenode.rpc-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:8022</value>
</property>
<property>
<name>dfs.namenode.http-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:9870</value>
</property>
<property>
<name>dfs.namenode.https-address.namenodeHA.namenode57</name>
<value>blc-control-03.novalocal:9871</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>022</value>
</property>
<property>
<name>dfs.client.block.write.locateFollowingBlock.retries</name>
<value>7</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>false</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hdfs-sockets/dn</value>
</property>
<property>
<name>dfs.client.read.shortcircuit.skip.checksum</name>
<value>false</value>
</property>
<property>
<name>dfs.client.domain.socket.data.traffic</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>ALWAYS</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name>
<value>true</value>
</property>
</configuration>
... View more
07-20-2020
01:22 AM
Let me refresh and kindly remind about this open support question.
... View more
07-02-2020
02:10 AM
@GangWar I do confirm that I am able to list the HDFS files from the CDSW master node: [root@cdsw-master-01 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x - hbase hbase 0 2020-06-29 19:23 /hbase
drwxrwxrwt - hdfs supergroup 0 2020-06-29 21:05 /tmp
drwxr-xr-x - hdfs supergroup 0 2020-06-29 21:44 /user Have re-deployed client configurations and refreshed the cluster. Have restarted NN roles. Do confirm that the HDFS gateway roles are available on the CDSW hosts: Please clarify what you mean by "Form CDSW host doc a list on HDFS". From a CDSW session input prompt I try to access HDFS, however still get the error: !hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/02 09:08:35","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/02 09:08:35","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}} Hence would appreciate your further assistance in the troubleshooting.
... View more
07-01-2020
03:14 AM
@GangWar I do confirm that localhost resolves to 127.0.0.1, not to 127.0.0.0, which I believe is a typo, isn't it? [root@cdsw-master-01 ~]# nslookup localhost
Server: 172.16.1.3
Address: 172.16.1.3#53
Non-authoritative answer:
Name: localhost
Address: 127.0.0.1 This is related to a CDSW proof-of-concept/trial on top of a CDH Enterprise R&D cluster, hence I am unable to submit a support case, though would be glad to do that. Please check your private messages inbox regarding the logs bundle.
... View more
06-29-2020
05:49 AM
Hi, Would appreciate any advice, how to solve a problem with terminal access from a CDSW session. Let me highlight that I can launch a session, however within a session I am unable to access the terminal – please see the attached screenshot with HTTP ERROR 401. The networking requirements are met, in particular: IPv6 is enabled CDSW hosts are within the same subnet as the CDH cluster DNS is configured with the relevant A record for domain name, CNAME record for wildcard domain, and a reverse PTR domain record (please see the enclosed response to ping command, where the DNS resolves the terminal's FQDN to CDSW master node's IP) No iptables rules were enabled SElinux is disabled [cloud-user@cdh-control-01 ~]$ ping -c1 tty-jidv65sd8630btx4.cdsw.<intranetdomain>
PING cdsw.<intranetdomain> (10.133.210.200) 56(84) bytes of data.
64 bytes from cdsw.<intranetdomain> (10.133.210.200): icmp_seq=1 ttl=60 time=0.884 ms
--- cdsw.<intranetdomain> ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.884/0.884/0.884/0.000 ms
... View more
- Tags:
- cdsw
06-29-2020
01:44 AM
Hi, Is a CDH Enterprise license transferable from one cluster (deployed just for a proof-of-concept and after e.g. 3 months destroyed), to another cluster (deployed only after the former one was destroyed)?
... View more
- Tags:
- cdh
- enterprice
- license
Labels:
06-25-2020
01:10 AM
I do confirm that the CDSW hosts meet all the networking requirements, in particular: IPv6 is enabled CDSW hosts are within the same subnet as the CDH cluster DNS is configured with the relevant A record for domain name, CNAME record for wildcard domain, and a reverse PTR domain record No iptables rules were enabled SElinux is disabled Let me also clarify – I can launch a session, however within a session I am unable to access the HDFS, from input prompt (as in my first post) nor any script. Example DNS lookup commands from a session's input prompt: !nslookup *.cdsw.<intranetdomain>
Server: 100.77.0.10
Address: 100.77.0.10#53
Non-authoritative answer:
*.cdsw.<intranetdomain> canonical name = cdsw.<intranetdomain>.
Name: cdsw.<intranetdomain>
Address: 10.133.210.200
!dig -x 10.133.210.200
; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> -x 10.133.210.200
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60863
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;200.210.133.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
200.210.133.10.in-addr.arpa. 300 IN PTR cdsw.<intranetdomain>.
;; Query time: 307 msec
;; SERVER: 100.77.0.10#53(100.77.0.10)
;; WHEN: Thu Jun 25 08:05:22 UTC 2020
;; MSG SIZE rcvd: 93 I have also noticed that I am unable to access a terminal – web browser returns HTTP ERROR 401. Though DNS resolves the terminal's FQDN to CDSW master node's IP. [cloud-user@cdh-control-01 ~]$ ping -c1 tty-jidv65sd8630btx4.cdsw.<intranetdomain>
PING cdsw.<intranetdomain> (10.133.210.200) 56(84) bytes of data.
64 bytes from cdsw.<intranetdomain> (10.133.210.200): icmp_seq=1 ttl=60 time=0.884 ms
--- cdsw.<intranetdomain> ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.884/0.884/0.884/0.000 ms
... View more
06-24-2020
07:35 AM
Hi, Would appreciate any advice, how to solve the following problem – in a CDH 6.3.2 HA-enabled cluster I am unable to access HDFS from a CDSW CLI session: !hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:37","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:37","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:38","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 1 failover attempts. Trying to failover after sleeping for 813ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:38","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 2 failover attempts. Trying to failover after sleeping for 1903ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:40","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 3 failover attempts. Trying to failover after sleeping for 2225ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:43","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 4 failover attempts. Trying to failover after sleeping for 9688ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:52","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 5 failover attempts. Trying to failover after sleeping for 9501ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:02","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 6 failover attempts. Trying to failover after sleeping for 9001ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:11","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 7 failover attempts. Trying to failover after sleeping for 13904ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:25","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 8 failover attempts. Trying to failover after sleeping for 14567ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:39","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 9 failover attempts. Trying to failover after sleeping for 15279ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:55","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 10 failover attempts. Trying to failover after sleeping for 10985ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:05","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 11 failover attempts. Trying to failover after sleeping for 8394ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:14","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 12 failover attempts. Trying to failover after sleeping for 21701ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:36","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 13 failover attempts. Trying to failover after sleeping for 16983ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:53","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 14 failover attempts. Trying to failover after sleeping for 8437ms."}}
ls: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost The contents of /etc/hosts files in the CDH and CDSW nodes is: # cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.112 cdh-control-02.novalocal
10.10.10.111 cdh-control-01.novalocal
10.10.10.131 cdh-worker-01.novalocal
10.10.10.132 cdh-worker-02.novalocal
10.10.10.122 cdh-edge-02.novalocal
10.10.10.113 cdh-control-03.novalocal
10.10.10.121 cdh-edge-01.novalocal
10.10.10.133 cdh-worker-03.novalocal
10.10.10.110 cdsw-master-01.novalocal
10.10.10.130 cdsw-worker-01.novalocal
... View more
06-23-2020
01:13 AM
The problem is solved – removed the parcel, had to add ".parcel" app type in /etc/httpd/conf/httpd.conf in a local repo: # sudo sed -i 's/AddType application\/x-gzip .gz .tgz$/AddType application\/x-gzip .gz .tgz .parcel/g' /etc/httpd/conf/httpd.conf and download, distribute, and activate again.
... View more
06-22-2020
06:40 AM
The *.jar file is already in /opt/cloudera/csd/ and have restarted the CM server. Otherwise I would be unable to distribute and activate the CDSW parcel, isn't it? Let me say it again – the problem is with the CDSW service after the parcel distribution and activation. I do not see the service in 'Add a Service' window, as outlined in the installation guide:
... View more
06-22-2020
03:03 AM
A kind reminder about this support question. Attaching also two screenshots.
... View more
06-18-2020
01:25 AM
Hi, Trying to add to a CDH 6.3.2 cluster (managed with CM 6.3.1) a CDSW 1.7.2 trial on two new hosts (master and worker). The CDSW parcel is downloaded, distributed and activated. Unfortunately the CDSW service is unavailable in the cluster's "Add Service" list. Would appreciate any assistance in troubleshooting.
... View more
05-19-2020
06:04 AM
Running CDH 6.3.x. The workarounds with removed .scratchdir or referred in https://issues.cloudera.org/browse/HUE-8910 (added random UUID to directory name i.e. .scratchdir.<UUID>) do not solve the problem. Unfortunately do not have access to the knowledge base. Hence how to solve the problem?
... View more