Support Questions

Find answers, ask questions, and share your expertise

Unable to start Node manager

avatar
Expert Contributor

Hi,

I am unable to start node manager on a node and get the attached error, any help is resolving this is much appreciated. 

Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst
	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:181)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:245)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst
	at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
	at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
	at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:950)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:937)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:210)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	... 5 more

 

1 ACCEPTED SOLUTION

avatar
Cloudera Employee

Hi,

The error message confirms that the LevelDB holding the YARN state store is corrupt :

org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst

Solution is to clean up and recreate the state store database :

1. In CM make sure the affected NodeManager has status STOP
2. Backup the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state to a different directory.
3. Delete all the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state.
4. Start affected NodeManager.

View solution in original post

2 REPLIES 2

avatar
Cloudera Employee

Hi,

The error message confirms that the LevelDB holding the YARN state store is corrupt :

org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 1 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000042.sst

Solution is to clean up and recreate the state store database :

1. In CM make sure the affected NodeManager has status STOP
2. Backup the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state to a different directory.
3. Delete all the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state.
4. Start affected NodeManager.

avatar
Cloudera Employee

Hi,

 

Did you tried taking backup of the folder /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. and deleting  all the contents under /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state and restarting affected node manager?

 

Please share the updates

 

Thanks

AKR