Created 11-28-2017 05:04 PM
When creating a new Ambari datanode, the name showing in the logs and in the HDFS report differs from the hostname of the node. The hdfs dfsadmin -report command:
Live datanodes (2):
Name: 131.100.200.83:50010 (IT431066.massey.ac.nz)
Hostname: augustus.massey.ac.nz
Decommission Status : Normal
Configured Capacity: 2442941495296 (2.22 TB)
DFS Used: 131072 (128 KB)
Non DFS Used: 0 (0 B)
DFS Remaining: 2044510578688 (1.86 TB)
DFS Used%: 0.00%
DFS Remaining%: 83.69%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Nov 28 16:42:20 NZDT 2017
Last Block Report: Tue Nov 28 16:35:35 NZDT 2017
Name: 131.100.200.82:50010 (IT427066.massey.ac.nz)
Hostname: IT427066.massey.ac.nz
Decommission Status : Normal
Configured Capacity: 37791439071232 (34.37 TB)
DFS Used: 1163448320 (1.08 GB)
Non DFS Used: 0 (0 B)
DFS Remaining: 35613520156672 (32.39 TB)
DFS Used%: 0.00%
DFS Remaining%: 94.24%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Tue Nov 28 16:42:20 NZDT 2017
Last Block Report: Tue Nov 28 15:16:47 NZDT 2017
The Name in the first line differs from the actual Hostname. The IP address is also wrong, the node has a completely different IP address (192.168.1.108). There is no reference to the IP or to the hostname of the computer that shows up in the HDFS Name in the first line.
Any clues or help would be appreciated.
Created 11-30-2017 09:22 PM
Is this problem reproducible? Or, just a one time thing?
Created 12-03-2017 11:35 PM
Hi Szetszwo, thanks for the reply.
It is reproducible. Also it prevents me to add any other datanode to this cluster because any new node will attempt to use the same name.
I would like to know where to change the HDFS name manually rather than let the system decide.
Finally, I only discovered about this issue because I saw the HDFS report (from command line) and the HDFS name on the logs, it does not show on Ambari dashboard.
Thanks for any clues.
Created 12-04-2017 05:41 AM
hostname hostname -f ping `hostname` --> the forward and reverse lookup should come up with same hostname and IP Address
2) From the above if you are seeing different results then it should be good to look DNS (if configured) if not from the DN server look for the below files:
$cat /etc/hosts $cat /etc/hostname $cat /etc/sysconfig/network
all of these files should contain same hostname and IP address
4) Once the above steps are done then you can STOP and START Ambari Agent
4) If all of these comes up with proper results then Ambari DB is the best place to look into to see any inconsistencies. To update the hostname in Ambari you can refer to this
Hope this helps.
Thanks
Venkat
Created 12-05-2017 01:52 AM
Thanks for your reply Venkat.
I checked 1) and 2), and can confirm that all points to augustus.massey.ac.nz and IP 192.168.1.108, and even stopping and starting Ambari does not change the report from HDFS.
I then tried to update the hostname as suggested in 4), and the answer (from the logs) is:
05 Dec 2017 14:31:04,108 ERROR [main] HostUpdateHelper:561 - Exception occurred during host names update, failed
org.apache.ambari.server.AmbariException: Hostname(s): it431066.massey.ac.nz was(were) not found.
at org.apache.ambari.server.update.HostUpdateHelper.validateHostChanges(HostUpdateHelper.java:197)
at org.apache.ambari.server.update.HostUpdateHelper.main(HostUpdateHelper.java:544)
That is what I expected, as I could not find any reference to it431066 machine anywhere in the configuration files nor in any of the Ambari repository. For example, "select host_name from hosts":
it427066.massey.ac.nz
augustus.massey.ac.nz
Again, no mention of it431066. It is a very strange problem, there must be something in one of the installation scripts that is catching an IP address from the network rather than using the one in /etc/hosts. The wrong hostname/IP is in sequence to the master server's IP number, that's the only sense I could make of the issue.
I might just clean up everything and start the installation again in an isolated network and see how it goes.
Thanks for your help.
Andre.