Support Questions

Find answers, ask questions, and share your expertise

Unable to start Node Manager - java.lang.UnsatisfiedLinkError

avatar
Explorer

I have a problem with new nodes added to the cluster HDP 3.0.1, hdfs service is ok but NodeManager service not start with this errors:

 /var/lib/ambari-agent/data/errors-26141.txt

resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec && /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager' returned 1. Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
Command line is not complete. Try option "help"
TERM environment variable not set.
ERROR: Cannot set priority of nodemanager process 34389

TERM env variable is set

 

NodeManager.log 

 

STARTUP_MSG: java = 1.8.0_112
************************************************************/
2020-03-04 15:18:35,735 INFO nodemanager.NodeManager (LogAdapter.java:info(51)) - registered UNIX signal handlers for [TERM, HUP, INT]
2020-03-04 15:18:36,133 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:openDatabase(1540)) - Using state database at /var/log/hadoop-yarn/n
odemanager/recovery-state/yarn-nm-state for recovery
2020-03-04 15:18:36,143 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(936)) - Error starting NodeManager
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in ja
va.library.path, Permission denied]
at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1543)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1531)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:353)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:285)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:358)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2020-03-04 15:18:36,149 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state STOPPED

 

I have reviewed the execution permissions of the temporary directories.

  • /tmp
  • /var/tmp
  • /var/lib/ambari-agent/tmp/

The java.library.path files were also copied from an old node.

The NodeManager service runs with root user but with yarn user don´t start.

14 REPLIES 14

avatar

Hi san_t_o Thanks for adding more context.

 

When cleaning the directories indicated, the libraries are copied automatically or it is necessary to copy them manually?

--> /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/* and also on /var/lib/ambari-agent/tmp/

I was testing in my local cluster. Apologies, I meant to clear /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/ and sorry for the typo in my previous comment.

 

Even before clearing off these directories or altering the location it would be best to review with strace once. It traces all system level calls and reviewing the last call prior to failure could give us more clues. To install strace - you can run

yum -y install strace

 

export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec
strace -f -s 2000 -o problematic_node /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --debug --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager



export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.1.0-187/hadoop/libexec
strace -f -s 2000 -o good_node /usr/hdp/3.0.1.0-187/hadoop-yarn/bin/yarn --debug --config /usr/hdp/3.0.1.0-187/hadoop/conf --daemon start nodemanager

 

The file problematic_node and good_node would have the traces and can you attach/paste them here.

avatar
Explorer

hi @venkatsambath,

I attach the strace output.

 

problematic_node

good_node

 

I found this lines:

Problematic Node

49213 stat("/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8", 0x7f19c4ef1800) = -1 ENOENT (No such file or directory)
49213 open("/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EACCES (Permission denied)

Good Node:

19290 stat("/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-280379409949290123.8", 0x7fbf276ef800) = -1 ENOENT (No such file or directory)
19290 open("/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-280379409949290123.8", O_RDWR|O_CREAT|O_EXCL, 0666) = 354

 

The permissions to "hadoop_java_io_tmpdir" in the problematic nodo :

# stat hadoop_java_io_tmpdir/
File: ‘hadoop_java_io_tmpdir/’
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: fd02h/64770d Inode: 29362223 Links: 39
Access: (1777/drwxrwxrwt) Uid: ( 1073/ hdfs) Gid: ( 1051/ hadoop)
Access: 2020-03-09 12:29:40.811107572 -0500
Modify: 2020-03-09 12:29:38.659084671 -0500
Change: 2020-03-09 12:29:38.659084671 -0500

 

I'll be waiting for your comments about.

 

 

 

 Regards.

avatar
Explorer

avatar
49213 open("/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EACCES (Permission denied)

During this step, the script is trying to open and get file descriptor for this directory and it was denied access 

/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8

So far we have inspected its parent directories and haven't seen any issues with. Can we get details of this directory too

ls -ln /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8

stat /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8

id yarn 

 

avatar
Explorer

@venkatsambath 

 

I think and the same, we have analized its parents directories and apparently they are fine.

 

# ls -ln /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8
ls: cannot access /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/libleveldbjni-64-1-6110205147654050510.8: No such file or directory
 # id yarn
uid=1075(yarn) gid=1051(hadoop) groups=1051(hadoop)

 

I have managed to start the node from the command line, but I have detected that the command that is executed from Ambari, sets the permissions of the path "hadoop_java_io_tmpdir" as owner to the user "hdfs:hadoop", however I do not identify why the user yarn does not have permissions writing and execution, the permissions are 1777 and the yarn user is member of hadoop group.

 

Regards