Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Yarn node manager not starting.

avatar
New Contributor
 
1 ACCEPTED SOLUTION

avatar

@MARUTHU PANDIYAN JAYARAMAN

If nodemanager.recovery.enabled is set to true, please set it to false and try starting the nodemanager. This should get you unblocked. NM is failing during recovery:

2016-11-16 23:49:26,076 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:initStorage(927)) - Using state database at /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery
2016-11-16 23:49:26,299 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Recovering log #362
2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=3 #361


2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=0 #362


2016-11-16 23:49:26,681 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)
org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)

View solution in original post

10 REPLIES 10

avatar
Super Guru

@MARUTHU PANDIYAN JAYARAMAN

Can you please share logs?

avatar
New Contributor

avatar
Cloudera Employee

through you log you have some corrupted files

 hdfs fsck /   double check with this command 

Please find here how to check that : http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files/19216037#19216037

avatar
New Contributor

No corrupt blocks.

Status: HEALTHY Total size: 2286120271 B Total dirs: 1826 Total files: 3378 Total symlinks: 0 (Files currently being written: 9) Total blocks (validated): 2905 (avg. block size 786960 B) (Total open file blocks (not validated): 3) Minimally replicated blocks: 2905 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 2905 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 13902 (82.71554 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Wed Nov 16 14:10:34 UTC 2016 in 2531 milliseconds

avatar
Cloudera Employee

Try to restart ambari-agent , or provide logs for yarn-nodemanager /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-xxx.com.log

avatar
New Contributor

avatar

@MARUTHU PANDIYAN JAYARAMAN

If nodemanager.recovery.enabled is set to true, please set it to false and try starting the nodemanager. This should get you unblocked. NM is failing during recovery:

2016-11-16 23:49:26,076 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:initStorage(927)) - Using state database at /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery
2016-11-16 23:49:26,299 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Recovering log #362
2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=3 #361


2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=0 #362


2016-11-16 23:49:26,681 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)
org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)

avatar
New Contributor

Problem solved. Node manager is up after setting this flag as false. Thank you Mugdha.

avatar
New Contributor

I am also Getting the error " localhost: ERROR: Cannot set priority of nodemanager process 41819". Can someone provide me answer

avatar
New Contributor
Labels