Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn node manager not starting.

avatar
Explorer
 
1 ACCEPTED SOLUTION

avatar

@MARUTHU PANDIYAN JAYARAMAN

If nodemanager.recovery.enabled is set to true, please set it to false and try starting the nodemanager. This should get you unblocked. NM is failing during recovery:

2016-11-16 23:49:26,076 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:initStorage(927)) - Using state database at /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery
2016-11-16 23:49:26,299 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Recovering log #362
2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=3 #361


2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=0 #362


2016-11-16 23:49:26,681 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)
org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)

View solution in original post

10 REPLIES 10

avatar
Master Guru

@MARUTHU PANDIYAN JAYARAMAN

Can you please share logs?

avatar
Explorer

avatar
Contributor

through you log you have some corrupted files

 hdfs fsck /   double check with this command 

Please find here how to check that : http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files/19216037#19216037

avatar
Explorer

No corrupt blocks.

Status: HEALTHY Total size: 2286120271 B Total dirs: 1826 Total files: 3378 Total symlinks: 0 (Files currently being written: 9) Total blocks (validated): 2905 (avg. block size 786960 B) (Total open file blocks (not validated): 3) Minimally replicated blocks: 2905 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 2905 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 13902 (82.71554 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Wed Nov 16 14:10:34 UTC 2016 in 2531 milliseconds

avatar
Contributor

Try to restart ambari-agent , or provide logs for yarn-nodemanager /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-xxx.com.log

avatar
Explorer

avatar

@MARUTHU PANDIYAN JAYARAMAN

If nodemanager.recovery.enabled is set to true, please set it to false and try starting the nodemanager. This should get you unblocked. NM is failing during recovery:

2016-11-16 23:49:26,076 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:initStorage(927)) - Using state database at /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery
2016-11-16 23:49:26,299 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Recovering log #362
2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=3 #361


2016-11-16 23:49:26,306 INFO  recovery.NMLeveldbStateStoreService$LeveldbLogger (NMLeveldbStateStoreService.java:log(969)) - Delete type=0 #362


2016-11-16 23:49:26,681 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)
org.iq80.leveldb.DBException: Invalid argument: not an sstable (bad magic number)

avatar
Explorer

Problem solved. Node manager is up after setting this flag as false. Thank you Mugdha.

avatar
New Contributor

I am also Getting the error " localhost: ERROR: Cannot set priority of nodemanager process 41819". Can someone provide me answer