Created on 08-06-2017 05:58 AM - edited 09-16-2022 05:03 AM
Dear experts,
I am running HDP 2.4.3 with Ambari 2.4 on AWS EC2 instances running on Red Hat Enterprise Linux Server release 7.3 (Maipo). Whenever i start the NFSGATEWAY service on a host , it is automatically getting stopped after sometime. Could you please assist me on this ?
Even i try to kill the existing nfs3 process and restart the service, the issue still persists. Please find few details below,
ps -ef | grep nfs3
----------------------------------------------------------
root 9766 1 0 01:42 pts/0 00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/nfs3_jsvc.out -errfile /var/log/hadoop/root/nfs3_jsvc.err -pidfile /var/run/hadoop/root/hadoop_privileged_nfs3.pid -nodetach -user hdfs -cp /usr/hdp/current/hadoop-client/conf:/usr/hdp/2.4.3.0-227/hadoop/lib/*:/usr/hdp/2.4.3.0-227/hadoop/.//*:/usr/hdp/2.4.3.0-227/hadoop-hdfs/./:/usr/hdp/2.4.3.0-227/hadoop-hdfs/lib/*:/usr/hdp/2.4.3.0-227/hadoop-hdfs/.//*:/usr/hdp/2.4.3.0-227/hadoop-yarn/lib/*:/usr/hdp/2.4.3.0-227/hadoop-yarn/.//*:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/lib/*:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/.//*::/usr/hdp/2.4.3.0-227/tez/*:/usr/hdp/2.4.3.0-227/tez/lib/*:/usr/hdp/2.4.3.0-227/tez/conf:/usr/hdp/2.4.3.0-227/tez/*:/usr/hdp/2.4.3.0-227/tez/lib/*:/usr/hdp/2.4.3.0-227/tez/conf -Xmx1024m -Dhdp.version=2.4.3.0-227 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/ -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.3.0-227/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.3.0-227 -Dhadoop.log.dir=/var/log/hadoop/ -Dhadoop.log.file=hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log -Dhadoop.home.dir=/usr/hdp/2.4.3.0-227/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/root -Dhadoop.id.str=hdfs -Xmx1024m -Dhadoop.security.logger=ERROR,DRFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter
systemctl status rpcbind
--------------------------------------------------
● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled) Active: active (running) since Sun 2017-08-06 01:29:31 EDT; 18min ago Main PID: 6164 (rpcbind) CGroup: /system.slice/rpcbind.service └─6164 /sbin/rpcbind -w
Created 08-06-2017 06:11 AM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 08-06-2017 06:11 AM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 08-06-2017 06:30 AM
I have tried changing ulimit as suggested and restarted the gateway but still no luck. I dont see any .log file but i am ale to get few details as below,
/var/log/hadoop/root
nfs3_jsvc.out
-------------------------
A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007f7b0a23bb7c, pid=19469, tid=140166720608064 # # JRE version: (8.0_77-b03) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.77-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # j java.lang.Object.<clinit>()V+0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /tmp/hs_err_pid19469.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp
hadoop-hdfs-nfs3-XXXXXXX.out
-------------------------------------------------------
ulimit -a for privileged nfs user hdfs core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63392 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 128000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Created 08-06-2017 06:53 AM
The following line of error indicates that it's a JVM crash.
# An error report file with more information is saved as: # /tmp/hs_err_pid19469.log
.
So if you can share the complete "/tmp/hs_err_pid19469.log" file here then we can check why the JVM crashed.
Created 08-06-2017 07:07 AM
hs-err-pid26771.txt Adding the latest log file for pid26771.
Created 08-06-2017 07:19 AM
It looks somewhere similar to : https://issues.apache.org/jira/browse/HDFS-12029 (although it is for DataNode) But looks like the issue is "jsvc" is crashing due to less "Xss" (Stack Size Value)
So please try increasing the stack size to a higher value like "-Xss2m" inside the file "/usr/hdp/2.6.0.3-8/hadoop-hdfs/bin/hdfs.distro"
Example: (Following line should be added to somewhere at the top like above DEFAULT_LIBEXEC_DIR so that following script can utilize this value.
export HADOOP_OPTS="$HADOOP_OPTS -Xss2m" DEFAULT_LIBEXEC_DIR="$bin"/../libexec
.
OR set the -Xss2m inside the following block of the "hadoop.distro" To apply the setting specifically for NFS3, In all the "HADOOP_OPTS" of the following block:
# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then if [ -n "$JSVC_HOME" ]; then if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR fi if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR -Xss2m" fi HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING -Xss2m" starting_privileged_nfs="true" else echo "It looks like you're trying to start a privileged NFS server, but"\ "\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server." fi fi
Then restart NFS.
Reference RHEL7 kernel issue: https://access.redhat.com/errata/RHBA-2017:1674
.
Created 08-06-2017 11:24 AM
Thanks alot, increasing the stack size as suggested for nfs gateway helped. Thanks again, you have resolved all my issues today 🙂
Created 08-06-2017 11:29 AM