Created 08-18-2019 08:49 PM
Hi,
I am unable to start node manager on a node and get the attached error, any help is resolving this is much appreciated.
Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:181)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:245)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:950)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:937)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:210)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more
Created 08-18-2019 11:29 PM
Hi,
The error message confirms that the LevelDB holding the YARN state store is corrupt :
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst
Solution is to clean up and recreate the state store database :
1. In CM make sure the affected NodeManager has status STOP
2. Backup the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state to a different directory.
3. Delete all the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state.
4. Start affected NodeManager.
Created 08-18-2019 11:29 PM
Hi,
The error message confirms that the LevelDB holding the YARN state store is corrupt :
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst
Solution is to clean up and recreate the state store database :
1. In CM make sure the affected NodeManager has status STOP
2. Backup the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state to a different directory.
3. Delete all the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state.
4. Start affected NodeManager.
Created 10-11-2019 03:59 AM
Hi,
Did you tried taking backup of the folder /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. and deleting all the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state and restarting affected node manager?
Please share the updates
Thanks
AKR