Created 01-24-2018 01:17 PM
I had included new data node in cluster.
While doing the load balance both the name node and stand by name node are not starting.
Error log as below:
018-01-24 18:45:30,853 - call['hdfs haadmin -ns HACluster -getServiceState nn1'] {'logoutput': True, 'user': 'hdfs'}
18/01/24
18:45:32 INFO ipc.Client: Retrying connect to server:
chtcuxhd02/172.16.0.99:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
MILLISECONDS)
Operation failed: Call From chtcuxhd03/172.16.0.123 to
chtcuxhd02:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
2018-01-24
18:45:32,900 - call returned (255, '18/01/24 18:45:32 INFO ipc.Client:
Retrying connect to server: chtcuxhd02/172.16.0.99:8020. Already tried 0
time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
MILLISECONDS)\nOperation failed: Call From chtcuxhd03/172.16.0.123 to
chtcuxhd02:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused')
2018-01-24
18:45:32,901 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -s
'"'"'http://chtcuxhd03:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"'
1>/tmp/tmprRLtzf 2>/tmp/tmpM0oDpk''] {'quiet': False}
2018-01-24 18:45:33,024 - call returned (7, '')
2018-01-24
18:45:33,025 - Getting jmx metrics from NN failed. URL:
http://chtcuxhd03:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 38, in get_value_from_jmx
_, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py",
line 61, in get_user_call_output
raise ExecutionFailed(err_msg, code, files_output[0], files_output[1])
ExecutionFailed:
Execution of 'curl -s
'http://chtcuxhd03:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'
1>/tmp/tmprRLtzf 2>/tmp/tmpM0oDpk' returned 7.
2018-01-24 18:45:33,026 - call['hdfs haadmin -ns HACluster -getServiceState nn2'] {'logoutput': True, 'user': 'hdfs'}
18/01/24
18:45:35 INFO ipc.Client: Retrying connect to server:
chtcuxhd03/172.16.0.123:8020. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
MILLISECONDS)
Operation failed: Call From chtcuxhd03/172.16.0.123 to
chtcuxhd03:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
2018-01-24
18:45:35,127 - call returned (255, '18/01/24 18:45:35 INFO ipc.Client:
Retrying connect to server: chtcuxhd03/172.16.0.123:8020. Already tried 0
time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
MILLISECONDS)\nOperation failed: Call From chtcuxhd03/172.16.0.123 to
chtcuxhd03:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused')
2018-01-24
18:45:35,127 - NameNode HA states: active_namenodes = [],
standby_namenodes = [], unknown_namenodes = [(u'nn1',
'chtcuxhd02:50070'), (u'nn2', 'chtcuxhd03:50070')]
2018-01-24 18:45:35,128 - Will retry 1 time(s), caught exception: No active NameNode was found.. Sleeping for 5 sec(s)
Thanks in advance
Created 01-16-2019 03:14 PM
Same Issue occurring while doing HDFS NameNode HA. While performing Enable NameNode HA Wizard start namenode process stuck at 35% but after sometime this process failed but it get completing successfully by retry.
,facing same issue while doing namenode ha!