Support Questions

Find answers, ask questions, and share your expertise

Name mode taking more time to start. Getting jmx metrics from NN failed

@Jay Kumar SenSharma

I had included new data node in cluster.

While doing the load balance both the name node and stand by name node are not starting.

Error log as below:

018-01-24 18:45:30,853 - call['hdfs haadmin -ns HACluster -getServiceState nn1'] {'logoutput': True, 'user': 'hdfs'}
18/01/24 18:45:32 INFO ipc.Client: Retrying connect to server: chtcuxhd02/172.16.0.99:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From chtcuxhd03/172.16.0.123 to chtcuxhd02:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2018-01-24 18:45:32,900 - call returned (255, '18/01/24 18:45:32 INFO ipc.Client: Retrying connect to server: chtcuxhd02/172.16.0.99:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)\nOperation failed: Call From chtcuxhd03/172.16.0.123 to chtcuxhd02:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused')
2018-01-24 18:45:32,901 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s '"'"'http://chtcuxhd03:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmprRLtzf 2>/tmp/tmpM0oDpk''] {'quiet': False}
2018-01-24 18:45:33,024 - call returned (7, '')
2018-01-24 18:45:33,025 - Getting jmx metrics from NN failed. URL: http://chtcuxhd03:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 38, in get_value_from_jmx
_, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
raise ExecutionFailed(err_msg, code, files_output[0], files_output[1])
ExecutionFailed: Execution of 'curl -s 'http://chtcuxhd03:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmprRLtzf 2>/tmp/tmpM0oDpk' returned 7.

2018-01-24 18:45:33,026 - call['hdfs haadmin -ns HACluster -getServiceState nn2'] {'logoutput': True, 'user': 'hdfs'}
18/01/24 18:45:35 INFO ipc.Client: Retrying connect to server: chtcuxhd03/172.16.0.123:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From chtcuxhd03/172.16.0.123 to chtcuxhd03:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2018-01-24 18:45:35,127 - call returned (255, '18/01/24 18:45:35 INFO ipc.Client: Retrying connect to server: chtcuxhd03/172.16.0.123:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)\nOperation failed: Call From chtcuxhd03/172.16.0.123 to chtcuxhd03:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused')
2018-01-24 18:45:35,127 - NameNode HA states: active_namenodes = [], standby_namenodes = [], unknown_namenodes = [(u'nn1', 'chtcuxhd02:50070'), (u'nn2', 'chtcuxhd03:50070')]
2018-01-24 18:45:35,128 - Will retry 1 time(s), caught exception: No active NameNode was found.. Sleeping for 5 sec(s)

Thanks in advance

1 REPLY 1

New Contributor

Same Issue occurring while doing HDFS NameNode HA. While performing Enable NameNode HA Wizard start namenode process stuck at 35% but after sometime this process failed but it get completing successfully by retry.

,

facing same issue while doing namenode ha!

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.