Created 06-24-2020 07:35 AM
Hi,
Would appreciate any advice, how to solve the following problem – in a CDH 6.3.2 HA-enabled cluster I am unable to access HDFS from a CDSW CLI session:
!hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:37","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:37","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:38","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 1 failover attempts. Trying to failover after sleeping for 813ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:38","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 2 failover attempts. Trying to failover after sleeping for 1903ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:40","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 3 failover attempts. Trying to failover after sleeping for 2225ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:43","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 4 failover attempts. Trying to failover after sleeping for 9688ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:08:52","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 5 failover attempts. Trying to failover after sleeping for 9501ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:02","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 6 failover attempts. Trying to failover after sleeping for 9001ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:11","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 7 failover attempts. Trying to failover after sleeping for 13904ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:25","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 8 failover attempts. Trying to failover after sleeping for 14567ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:39","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 9 failover attempts. Trying to failover after sleeping for 15279ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:09:55","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 10 failover attempts. Trying to failover after sleeping for 10985ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:05","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 11 failover attempts. Trying to failover after sleeping for 8394ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:14","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 12 failover attempts. Trying to failover after sleeping for 21701ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:36","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-03.novalocal:8020 after 13 failover attempts. Trying to failover after sleeping for 16983ms."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"INFO","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/06/24 13:10:53","logger":"retry.RetryInvocationHandler","timezone":"UTC","log":{"message":"java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "cdh-control-02.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over cdh-control-02.novalocal:8020 after 14 failover attempts. Trying to failover after sleeping for 8437ms."}}
ls: Invalid host name: local host is: (unknown); destination host is: "cdh-control-03.novalocal":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost
The contents of /etc/hosts files in the CDH and CDSW nodes is:
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.112 cdh-control-02.novalocal
10.10.10.111 cdh-control-01.novalocal
10.10.10.131 cdh-worker-01.novalocal
10.10.10.132 cdh-worker-02.novalocal
10.10.10.122 cdh-edge-02.novalocal
10.10.10.113 cdh-control-03.novalocal
10.10.10.121 cdh-edge-01.novalocal
10.10.10.133 cdh-worker-03.novalocal
10.10.10.110 cdsw-master-01.novalocal
10.10.10.130 cdsw-worker-01.novalocal
Created 06-24-2020 02:59 PM
@Marek CDSW don't care about /etc/hosts file. You must have to meet all network Requirements below:
Most importantly forward/reverse lookup and then the wildcard dns to work form session. The issue seems to be with wildcard as form session the hostname is not resolving.
Created on 06-25-2020 01:10 AM - edited 06-25-2020 09:38 AM
I do confirm that the CDSW hosts meet all the networking requirements, in particular:
Let me also clarify – I can launch a session, however within a session I am unable to access the HDFS, from input prompt (as in my first post) nor any script.
Example DNS lookup commands from a session's input prompt:
!nslookup *.cdsw.<intranetdomain>
Server: 100.77.0.10
Address: 100.77.0.10#53
Non-authoritative answer:
*.cdsw.<intranetdomain> canonical name = cdsw.<intranetdomain>.
Name: cdsw.<intranetdomain>
Address: 10.133.210.200
!dig -x 10.133.210.200
; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> -x 10.133.210.200
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60863
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;200.210.133.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
200.210.133.10.in-addr.arpa. 300 IN PTR cdsw.<intranetdomain>.
;; Query time: 307 msec
;; SERVER: 100.77.0.10#53(100.77.0.10)
;; WHEN: Thu Jun 25 08:05:22 UTC 2020
;; MSG SIZE rcvd: 93
I have also noticed that I am unable to access a terminal – web browser returns HTTP ERROR 401. Though DNS resolves the terminal's FQDN to CDSW master node's IP.
[cloud-user@cdh-control-01 ~]$ ping -c1 tty-jidv65sd8630btx4.cdsw.<intranetdomain>
PING cdsw.<intranetdomain> (10.133.210.200) 56(84) bytes of data.
64 bytes from cdsw.<intranetdomain> (10.133.210.200): icmp_seq=1 ttl=60 time=0.884 ms
--- cdsw.<intranetdomain> ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.884/0.884/0.884/0.000 ms
Created on 06-29-2020 01:24 AM - edited 06-29-2020 05:42 AM
A kind reminder about this open support question.
Created 07-01-2020 01:11 AM
@Marek CDSW don’t honour /etc/hosts file so that's not an issue. Can you confirm the localhost is resolving to 127.0.0.0 and if the that is then please share the cdsw logs bundle, I will check once more.
cdsw logs -x
If you have support subscription feel free to file a case with us, we will be more than happy to assist you.
Created 07-01-2020 03:14 AM
I do confirm that localhost resolves to 127.0.0.1, not to 127.0.0.0, which I believe is a typo, isn't it?
[root@cdsw-master-01 ~]# nslookup localhost
Server: 172.16.1.3
Address: 172.16.1.3#53
Non-authoritative answer:
Name: localhost
Address: 127.0.0.1
This is related to a CDSW proof-of-concept/trial on top of a CDH Enterprise R&D cluster, hence I am unable to submit a support case, though would be glad to do that. Please check your private messages inbox regarding the logs bundle.
Created 07-01-2020 09:50 AM
@Marek Is the hdfs dns -ls is working form CDSW node itself?
The below error means there is some issue with client file.
Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Can you check if you have gateway roles installed and form CDSW master node you are able to list files.
I would like to perform:
Created on 07-02-2020 02:10 AM - edited 07-02-2020 03:01 AM
@GangWar I do confirm that I am able to list the HDFS files from the CDSW master node:
[root@cdsw-master-01 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x - hbase hbase 0 2020-06-29 19:23 /hbase
drwxrwxrwt - hdfs supergroup 0 2020-06-29 21:05 /tmp
drwxr-xr-x - hdfs supergroup 0 2020-06-29 21:44 /user
Have re-deployed client configurations and refreshed the cluster.
Have restarted NN roles.
Do confirm that the HDFS gateway roles are available on the CDSW hosts:
Please clarify what you mean by "Form CDSW host doc a list on HDFS".
From a CDSW session input prompt I try to access HDFS, however still get the error:
!hdfs dfs -ls /
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/02 09:08:35","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode43. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
{"type":"log","host":"host_name","category":"HDFS-hdfs-GATEWAY-BASE","level":"WARN","system":"etcd_clcm_std_3C_2E_3W_cdh","time": "20/07/02 09:08:35","logger":"hdfs.DFSUtilClient","timezone":"UTC","log":{"message":"Namenode for namenodeHA remains unresolved for ID namenode57. Check your hdfs-site.xml file to ensure namenodes are configured properly."}}
Hence would appreciate your further assistance in the troubleshooting.
Created 07-06-2020 02:17 AM
A kind reminder about this open support question.
Created 07-20-2020 01:22 AM
Let me refresh and kindly remind about this open support question.