Created on 08-06-2017 05:58 AM - edited 09-16-2022 05:03 AM
Dear experts,
I am running HDP 2.4.3 with Ambari 2.4 on AWS EC2 instances running on Red Hat Enterprise Linux Server release 7.3 (Maipo). Whenever i start the NFSGATEWAY service on a host , it is automatically getting stopped after sometime. Could you please assist me on this ?
Even i try to kill the existing nfs3 process and restart the service, the issue still persists. Please find few details below,
ps -ef | grep nfs3
----------------------------------------------------------
root 9766 1 0 01:42 pts/0 00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/nfs3_jsvc.out -errfile /var/log/hadoop/root/nfs3_jsvc.err -pidfile /var/run/hadoop/root/hadoop_privileged_nfs3.pid -nodetach -user hdfs -cp /usr/hdp/current/hadoop-client/conf:/usr/hdp/2.4.3.0-227/hadoop/lib/*:/usr/hdp/2.4.3.0-227/hadoop/.//*:/usr/hdp/2.4.3.0-227/hadoop-hdfs/./:/usr/hdp/2.4.3.0-227/hadoop-hdfs/lib/*:/usr/hdp/2.4.3.0-227/hadoop-hdfs/.//*:/usr/hdp/2.4.3.0-227/hadoop-yarn/lib/*:/usr/hdp/2.4.3.0-227/hadoop-yarn/.//*:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/lib/*:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/.//*::/usr/hdp/2.4.3.0-227/tez/*:/usr/hdp/2.4.3.0-227/tez/lib/*:/usr/hdp/2.4.3.0-227/tez/conf:/usr/hdp/2.4.3.0-227/tez/*:/usr/hdp/2.4.3.0-227/tez/lib/*:/usr/hdp/2.4.3.0-227/tez/conf -Xmx1024m -Dhdp.version=2.4.3.0-227 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/ -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.3.0-227/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.3.0-227 -Dhadoop.log.dir=/var/log/hadoop/ -Dhadoop.log.file=hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log -Dhadoop.home.dir=/usr/hdp/2.4.3.0-227/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/root -Dhadoop.id.str=hdfs -Xmx1024m -Dhadoop.security.logger=ERROR,DRFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter
systemctl status rpcbind
--------------------------------------------------
● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled) Active: active (running) since Sun 2017-08-06 01:29:31 EDT; 18min ago Main PID: 6164 (rpcbind) CGroup: /system.slice/rpcbind.service └─6164 /sbin/rpcbind -w
Created 08-06-2017 06:11 AM
Do you see any error for NFS service ? Can you please share the nfs logs from the "/var/log/hadoop/root/" directory? Like "nfs3_jsvc.out", "nfs3_jsvc.err", "hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log"
Can you check what is the ulimit value set for the NFS service ?
Sometimes NFS crashes due to less file descriptor limit. If the value is set to too low then we can try increasing this value to alittle higher value from Ambari UI as:
Navigate to "Ambari UI --> HDFS --> Configs --> Advanced --> Advanced hadoop-env --> hadoop-env template"
Now add the following entry in this "hadoop-env" template
if [ "$command" == "nfs3" ]; then ulimit -n 128000 ; fi
Then try restarting the NFSGateway..
Created 08-06-2017 06:11 AM
Do you see any error for NFS service ? Can you please share the nfs logs from the "/var/log/hadoop/root/" directory? Like "nfs3_jsvc.out", "nfs3_jsvc.err", "hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log"
Can you check what is the ulimit value set for the NFS service ?
Sometimes NFS crashes due to less file descriptor limit. If the value is set to too low then we can try increasing this value to alittle higher value from Ambari UI as:
Navigate to "Ambari UI --> HDFS --> Configs --> Advanced --> Advanced hadoop-env --> hadoop-env template"
Now add the following entry in this "hadoop-env" template
if [ "$command" == "nfs3" ]; then ulimit -n 128000 ; fi
Then try restarting the NFSGateway..
Created 08-06-2017 06:30 AM
I have tried changing ulimit as suggested and restarted the gateway but still no luck. I dont see any .log file but i am ale to get few details as below,
/var/log/hadoop/root
nfs3_jsvc.out
-------------------------
A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007f7b0a23bb7c, pid=19469, tid=140166720608064 # # JRE version: (8.0_77-b03) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.77-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # j java.lang.Object.<clinit>()V+0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /tmp/hs_err_pid19469.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp
hadoop-hdfs-nfs3-XXXXXXX.out
-------------------------------------------------------
ulimit -a for privileged nfs user hdfs core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63392 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 128000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Created 08-06-2017 06:53 AM
The following line of error indicates that it's a JVM crash.
# An error report file with more information is saved as: # /tmp/hs_err_pid19469.log
.
So if you can share the complete "/tmp/hs_err_pid19469.log" file here then we can check why the JVM crashed.
Created 08-06-2017 07:07 AM
hs-err-pid26771.txt Adding the latest log file for pid26771.
Created 08-06-2017 07:19 AM
It looks somewhere similar to : https://issues.apache.org/jira/browse/HDFS-12029 (although it is for DataNode) But looks like the issue is "jsvc" is crashing due to less "Xss" (Stack Size Value)
So please try increasing the stack size to a higher value like "-Xss2m" inside the file "/usr/hdp/2.6.0.3-8/hadoop-hdfs/bin/hdfs.distro"
Example: (Following line should be added to somewhere at the top like above DEFAULT_LIBEXEC_DIR so that following script can utilize this value.
export HADOOP_OPTS="$HADOOP_OPTS -Xss2m" DEFAULT_LIBEXEC_DIR="$bin"/../libexec
.
OR set the -Xss2m inside the following block of the "hadoop.distro" To apply the setting specifically for NFS3, In all the "HADOOP_OPTS" of the following block:
# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then if [ -n "$JSVC_HOME" ]; then if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR fi if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR -Xss2m" fi HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING -Xss2m" starting_privileged_nfs="true" else echo "It looks like you're trying to start a privileged NFS server, but"\ "\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server." fi fi
Then restart NFS.
Reference RHEL7 kernel issue: https://access.redhat.com/errata/RHBA-2017:1674
.
Created 08-06-2017 11:24 AM
Thanks alot, increasing the stack size as suggested for nfs gateway helped. Thanks again, you have resolved all my issues today 🙂
Created 08-06-2017 11:29 AM