Support Questions
Find answers, ask questions, and share your expertise

Fail Start HDFS (hdp 2.6) after restart

Fail Start HDFS (hdp 2.6) after restart

Contributor
2017-08-18 14:44:47,767 - Getting jmx metrics from NN failed. URL: http://hdf-01.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx
    return data_dict["beans"][0][property]
IndexError: list index out of range
2017-08-18 14:44:50,151 - Getting jmx metrics from NN failed. URL: http://hdf-02.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx
    return data_dict["beans"][0][property]
IndexError: list index out of range
2017-08-18 14:44:57,548 - Getting jmx metrics from NN failed. URL: http://hdf-01.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx
    return data_dict["beans"][0][property]
IndexError: list index out of range
2017-08-18 14:44:59,961 - Getting jmx metrics from NN failed. URL: http://hdf-02.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx
    return data_dict["beans"][0][property]
IndexError: list index out of range
2017-08-18 14:45:07,393 - Getting jmx metrics from NN failed. URL: http://hdf-02.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 42, in get_value_from_jmx
    return data_dict["beans"][0][property]
IndexError: list index out of range
2017-08-18 14:47:29,227 - Getting jmx metrics from NN failed. URL: http://hdf-02.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 38, in get_value_from_jmx
    _, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
    raise ExecutionFailed(err_msg, code, files_output[0], files_output[1])
ExecutionFailed: Execution of 'curl -s 'http://hdf-02.kyivstar.ua:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmpj5Dh08 2>/tmp/tmpMWPfsa' returned 52.
4 REPLIES 4
Highlighted

Re: Fail Start HDFS (hdp 2.6) after restart

Contributor

Please help how troubleshoot this ?

Highlighted

Re: Fail Start HDFS (hdp 2.6) after restart

Expert Contributor
@Dmitro Vasilenko

this is happening because it is not getting enough time to do establish the connection for this you can do the below:

from file:

 /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py

look for the line:

@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)

here we need to change the time out and number of retries.

@retry(times=10, sleep_time=25, backoff_factor=2, err_class=Fail)

Hope this helps.

Thanks

Venkat

Highlighted

Re: Fail Start HDFS (hdp 2.6) after restart

Contributor

My config :

[root@hdf-02 ~]# grep -i @retry /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py @retry(times=125, sleep_time=5, backoff_factor=2, err_class=Fail) [root@hdf-02 ~]#

Highlighted

Re: Fail Start HDFS (hdp 2.6) after restart

Contributor
Also have error in journalnod : 

[root@hdf-01 hdfs]# pwd
/var/log/hadoop/hdfs
[root@hdf-01 hdfs]# tail -30 hadoop-hdfs-journalnode-hdf-01.log
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
2017-08-18 17:48:01,743 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1011712
2017-08-18 17:48:01,743 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(359)) - Caught exception after scanning through 0 ops from /hadoop/hdfs/journalnode/kspoc/current/edits_inprogress_0000000000002836871 while determining its valid length. Position was 1011712
java.io.IOException: Can't scan a pre-transactional edit log.
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4974)
        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)
        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:355)
        at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:551)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:192)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:152)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:90)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:99)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:189)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
2017-08-18 17:48:01,743 WARN  namenode.FSImage (EditLogFileInputStream.java:scanEditLog(364)) - After resync, position is 1011712
[root@hdf-01 hdfs]#