Created 01-27-2016 05:04 AM
One of our clients have asked us to move the prefixed log location to a different mounted point for all the service logs, for example the prefix log location for hdfs is moved from /var/log/hadoop to /hdp/logs/hadoop via api calls.
Everything restarted smoothly however only one NM is coming up out of 5, and a manual restart only works on the first NM.
All other NM are through the same error, below;
STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r ef0582ca14b8177a3cbb6376807545272677d730; compiled by 'jenkins' on 2015-12-16T03:01Z STARTUP_MSG: java = 1.7.0_67 ************************************************************/ 2016-01-26 15:01:25,155 INFO nodemanager.NodeManager (LogAdapter.java:info(45)) - registered UNIX signal handlers for [TERM, HUP, INT] 2016-01-26 15:01:26,283 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:initStorage(927)) - Using state database at /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state for recovery 2016-01-26 15:01:26,313 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) 2016-01-26 15:01:26,316 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2016-01-26 15:01:26,317 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(540)) - Error starting NodeManager org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:178) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:220) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/LOCK: Resource temporarily unavailable at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:930) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2016-01-26 15:01:26,319 INFO nodemanager.NodeManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NodeManager at bvluxhdpdn05.conocophillips.net/158.139.121.115 ************************************************************/
"
As we can see it is not complaining about LOCK file bot present but unavailable, as whichever NM starts first acquire this LOCK (remember this is a single mount point and not local file-system)
If I change the log location back to local file-system even for example /tmp/yarnlogs its works smooth since all the NM get access to LOCK file on local file-system where ever they are installed.
Has someone faces this issue and can you please suggest a fix to this.
Thanks Mayank
Created 01-27-2016 07:55 AM
Hi @mkataria ,
did I understand that correct, do all the Nodemanagers have kind of network storage mounted and want to write to /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state ?
This won't work since every NM wants to keep his own state in dir ..../yarn-nm-state, therefore just one NM can create the LOCK file there (besides the files keeping the state).
Logging to a central directory for that cases is difficult.
One solution could be to put each NM in a different config group, and specify the log directory for each config group, e.g.
/hdp/logs/hadoop-yarn/nm1
/hdp/logs/hadoop-yarn/nm2
...
Regards, Gerd
Created 01-27-2016 07:55 AM
Hi @mkataria ,
did I understand that correct, do all the Nodemanagers have kind of network storage mounted and want to write to /hdp/logs/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state ?
This won't work since every NM wants to keep his own state in dir ..../yarn-nm-state, therefore just one NM can create the LOCK file there (besides the files keeping the state).
Logging to a central directory for that cases is difficult.
One solution could be to put each NM in a different config group, and specify the log directory for each config group, e.g.
/hdp/logs/hadoop-yarn/nm1
/hdp/logs/hadoop-yarn/nm2
...
Regards, Gerd
Created 01-27-2016 04:24 PM
@Gerd Koenig I had same doubts thanks for confirming, can you share something on putting NM to diff, config groups at your leisure.
Created on 01-27-2016 05:50 PM - edited 08-19-2019 04:10 AM
Hi @mkataria ,
sure, I'll try my best.
First click on service 'HDFS' in Ambari, then
In the next dialog, create one config-group per Nodemanager , provide a corresponding name and assign that node to that config group
Then get back to the "general" HDFS config page (picture 1), select a config group and adjust the log destination for that particular Nodemanager-node (==config-group).
...and restart HDFS 😉
Regards, Gerd
Created 01-27-2016 11:26 AM
@mkataria check the disk on the nodes, permissions, mount options, space, etc.
Created 01-27-2016 04:23 PM
@Artem Ervits checked all of those and does not seems to be an issue