Created 12-05-2017 06:15 AM
in our ambari cluster we see the following
on the first nn1
$ hdfs haadmin -checkHealth master01 Illegal argument: Unable to determine service address for namenode 'master01'
on the second namenode
$ hdfs haadmin -checkHealth master03 Illegal argument: Unable to determine service address for namenode 'master03'
what could be the problem here ?
and how to debug the command in order to verify what is the root cause ?
remark - DNS configuration on all hosts/IP are right
other example since namenode is down
[hdfs@master01 root]$ hdfs haadmin -transitionToActive --forceactive master01 Illegal argument: Unable to determine service address for namenode 'master01'
Created 12-05-2017 08:40 AM
What is the value of dfs.ha.namenodes.{ha-cluster-name} in your hdfs-site.xml
You can get the {ha-cluster-name} from fs.defaultFS from core-site.xml
Assuming fs.defaultFS is hdfs://hortonworks. hortonworks is the ha-cluster-name.
Thanks,
Aditya
Created 12-05-2017 08:40 AM
What is the value of dfs.ha.namenodes.{ha-cluster-name} in your hdfs-site.xml
You can get the {ha-cluster-name} from fs.defaultFS from core-site.xml
Assuming fs.defaultFS is hdfs://hortonworks. hortonworks is the ha-cluster-name.
Thanks,
Aditya
Created 12-05-2017 08:56 AM
# grep dfs.ha.namenodes /etc/hadoop/conf/hdfs-site.xml
<name>dfs.ha.namenodes.hdfsha</name>
Created 12-05-2017 09:00 AM
Created 12-05-2017 10:22 AM
grep -A 3 dfs.ha.namenodes /etc/hadoop/conf/hdfs-site.xml <name>dfs.ha.namenodes.hdfsha</name> <value>nn1,nn2</value> </property> but: [hdfs@master01 root]$ hdfs getconf -namenodes master01.sys56.com master03.sys56.com [hdfs@master01 root]$
Created 12-05-2017 10:30 AM
Run the check health as below
hdfs haadmin -checkHealth nn1
hdfs haadmin -checkHealth nn2
Created 12-05-2017 10:33 AM
hdfs haadmin -checkHealth nn1 17/12/05 10:31:58 INFO ipc.Client: Retrying connect to server: master01.sys56.com/108.87.28.153:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From master01.sys56.com/108.87.28.153 to master01.sys56.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Created 12-05-2017 10:47 AM
The hostname looks different in 2 places.
hdfs getconf -namenodes gives 'master01.sys56.com' and the above logs give 'master01.sys564.com'
Is it sys56 or sys564. Check the hostname properly and start HDFS
Make sure that below properties are set correctly.
dfs.namenode.rpc-address.hdfsha.nn1 and dfs.namenode.rpc-address.hdfsha.nn2
Thanks,
Aditya
Created 12-05-2017 11:45 AM
Just summarising things. The original thread discussed here is "Unable to determine service address for namenode nn1" which was due to the usage of wrong service Id in the comment. You were using 'master01' and 'master03' instead of 'nn1' and 'nn2'. After using the correct service Id you got past the initial error and you are facing connection refused error because the Name nodes are not started. I see another thread opened for the same issue ( https://community.hortonworks.com/questions/149951/how-to-force-name-node-to-be-active.html). Please do not deviate from the main issue. If you think that the main issue discussed in this thread is resolved, please accept the answer and follow up on a single thread. It will be easy for other community users to follow the thread and understand the root cause.
Hope this helps 🙂
Thanks,
Aditya
Created 12-05-2017 12:22 PM
serviceId is different from the namenode host name. I think that is fine. There is no conflict on that.