Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nodemanager fails to start

Solved Go to solution

Nodemanager fails to start

New Contributor

Hello Hortonworks Community,

I'm having some issues with two of my nodemanagers on a 4 node cluster. This cluster is running on CentOS 7 with HDP 2.5. I noticed 2/4 nodemanagers being started so my first attempt to resolve the situation was to start the two nodemanagers from the ambari front end. After starting both nodemanagers the same number was being reported: 2/4 started. Then, I tried a second possible solution. I removed the two nodemanagers that did not start and reinstalled them. This did not work either. I am looking at the log and this is the reason for the failed start: (/var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<FQDN>.log)

2017-03-01 09:51:05,115 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
        at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
        at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
        at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:966)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:953)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:200)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 5 more
2017-03-01 09:51:05,116 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
        at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
        at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
        at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
        at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:966)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:953)
        at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:200)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        ... 5 more
2017-03-01 09:51:05,120 INFO  nodemanager.NodeManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at <FQDN>/<IP>
************************************************************/

Does anyone have any ideas on how to resolve this problem?

Thanks,

-Jose

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Nodemanager fails to start

Expert Contributor

Hi Jose,

Maybe a sst file got corrupt can you try to remove the folder of /var/log/hadoop-yarn/nodemanager/recovery-state from failed nodemanagers and check if starts?

These files stays in the system even if you decomission the nodes.

.

4 REPLIES 4
Highlighted

Re: Nodemanager fails to start

Expert Contributor

Hi Jose,

From error log message I found that it's because of checksum mismatch. Please refer below links. Hope it will work.

1. http://stackoverflow.com/questions/15434709/checksum-exception-when-reading-from-or-copying-to-hdfs-...

2. https://issues.apache.org/jira/browse/HDFS-6804

Thanks,

Mahesh

Re: Nodemanager fails to start

Rising Star

Thank you for the answer; however, both are not for levelDB, which is used in node manager.

Do you have any idea to initialize levelDB. I try to find it, but i can't find any good article.

Re: Nodemanager fails to start

Expert Contributor

Hi Jose,

Maybe a sst file got corrupt can you try to remove the folder of /var/log/hadoop-yarn/nodemanager/recovery-state from failed nodemanagers and check if starts?

These files stays in the system even if you decomission the nodes.

.

Re: Nodemanager fails to start

New Contributor

Hey Juan,

Thanks for this answer. This actually did fix the nodemanager situation.

Don't have an account?
Coming from Hortonworks? Activate your account here