Support Questions

Find answers, ask questions, and share your expertise

NFS Gateway is failing automatically

avatar
Contributor

Dear experts,

I am running HDP 2.4.3 with Ambari 2.4 on AWS EC2 instances running on Red Hat Enterprise Linux Server release 7.3 (Maipo). Whenever i start the NFSGATEWAY service on a host , it is automatically getting stopped after sometime. Could you please assist me on this ?

Even i try to kill the existing nfs3 process and restart the service, the issue still persists. Please find few details below,

ps -ef | grep nfs3

----------------------------------------------------------

root 9766 1 0 01:42 pts/0 00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/nfs3_jsvc.out -errfile /var/log/hadoop/root/nfs3_jsvc.err -pidfile /var/run/hadoop/root/hadoop_privileged_nfs3.pid -nodetach -user hdfs -cp /usr/hdp/current/hadoop-client/conf:/usr/hdp/2.4.3.0-227/hadoop/lib/*:/usr/hdp/2.4.3.0-227/hadoop/.//*:/usr/hdp/2.4.3.0-227/hadoop-hdfs/./:/usr/hdp/2.4.3.0-227/hadoop-hdfs/lib/*:/usr/hdp/2.4.3.0-227/hadoop-hdfs/.//*:/usr/hdp/2.4.3.0-227/hadoop-yarn/lib/*:/usr/hdp/2.4.3.0-227/hadoop-yarn/.//*:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/lib/*:/usr/hdp/2.4.3.0-227/hadoop-mapreduce/.//*::/usr/hdp/2.4.3.0-227/tez/*:/usr/hdp/2.4.3.0-227/tez/lib/*:/usr/hdp/2.4.3.0-227/tez/conf:/usr/hdp/2.4.3.0-227/tez/*:/usr/hdp/2.4.3.0-227/tez/lib/*:/usr/hdp/2.4.3.0-227/tez/conf -Xmx1024m -Dhdp.version=2.4.3.0-227 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/ -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.3.0-227/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.3.0-227 -Dhadoop.log.dir=/var/log/hadoop/ -Dhadoop.log.file=hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log -Dhadoop.home.dir=/usr/hdp/2.4.3.0-227/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native:/usr/hdp/2.4.3.0-227/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.3.0-227/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/root -Dhadoop.id.str=hdfs -Xmx1024m -Dhadoop.security.logger=ERROR,DRFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter

systemctl status rpcbind

--------------------------------------------------

● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled) Active: active (running) since Sun 2017-08-06 01:29:31 EDT; 18min ago Main PID: 6164 (rpcbind) CGroup: /system.slice/rpcbind.service └─6164 /sbin/rpcbind -w

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Chiranjeevi Nimmala

Do you see any error for NFS service ? Can you please share the nfs logs from the "/var/log/hadoop/root/" directory? Like "nfs3_jsvc.out", "nfs3_jsvc.err", "hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log"


Can you check what is the ulimit value set for the NFS service ?


Sometimes NFS crashes due to less file descriptor limit. If the value is set to too low then we can try increasing this value to alittle higher value from Ambari UI as:

Navigate to "Ambari UI -->  HDFS --> Configs --> Advanced --> Advanced hadoop-env --> hadoop-env template"


Now add the following entry in this "hadoop-env" template

if [ "$command" == "nfs3" ]; then ulimit -n 128000 ; fi


Then try restarting the NFSGateway..

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@Chiranjeevi Nimmala

Do you see any error for NFS service ? Can you please share the nfs logs from the "/var/log/hadoop/root/" directory? Like "nfs3_jsvc.out", "nfs3_jsvc.err", "hadoop-hdfs-nfs3-ip-10-0-0-223.ap-south-1.compute.internal.log"


Can you check what is the ulimit value set for the NFS service ?


Sometimes NFS crashes due to less file descriptor limit. If the value is set to too low then we can try increasing this value to alittle higher value from Ambari UI as:

Navigate to "Ambari UI -->  HDFS --> Configs --> Advanced --> Advanced hadoop-env --> hadoop-env template"


Now add the following entry in this "hadoop-env" template

if [ "$command" == "nfs3" ]; then ulimit -n 128000 ; fi


Then try restarting the NFSGateway..

avatar
Contributor

I have tried changing ulimit as suggested and restarted the gateway but still no luck. I dont see any .log file but i am ale to get few details as below,

/var/log/hadoop/root

nfs3_jsvc.out

-------------------------

A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007f7b0a23bb7c, pid=19469, tid=140166720608064 # # JRE version: (8.0_77-b03) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.77-b03 mixed mode linux-amd64 compressed oops) # Problematic frame: # j java.lang.Object.<clinit>()V+0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /tmp/hs_err_pid19469.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp

hadoop-hdfs-nfs3-XXXXXXX.out

-------------------------------------------------------

ulimit -a for privileged nfs user hdfs core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63392 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 128000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

avatar
Master Mentor

@Chiranjeevi Nimmala

The following line of error indicates that it's a JVM crash.

# An error report file with more information is saved as: # /tmp/hs_err_pid19469.log

.

So if you can share the complete "/tmp/hs_err_pid19469.log" file here then we can check why the JVM crashed.

avatar
Contributor

hs-err-pid26771.txt Adding the latest log file for pid26771.

avatar
Master Mentor

@Chiranjeevi Nimmala

It looks somewhere similar to : https://issues.apache.org/jira/browse/HDFS-12029 (although it is for DataNode) But looks like the issue is "jsvc" is crashing due to less "Xss" (Stack Size Value)

So please try increasing the stack size to a higher value like "-Xss2m" inside the file "/usr/hdp/2.6.0.3-8/hadoop-hdfs/bin/hdfs.distro"

Example: (Following line should be added to somewhere at the top like above DEFAULT_LIBEXEC_DIR so that following script can utilize this value.

export HADOOP_OPTS="$HADOOP_OPTS -Xss2m"
DEFAULT_LIBEXEC_DIR="$bin"/../libexec

.

OR set the -Xss2m inside the following block of the "hadoop.distro" To apply the setting specifically for NFS3, In all the "HADOOP_OPTS" of the following block:

# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables
if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then
  if [ -n "$JSVC_HOME" ]; then
    if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then
      HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR
    fi

    if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then
      HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR
      HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR -Xss2m"
    fi
   
    HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER
    HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING -Xss2m"
    starting_privileged_nfs="true"
  else
    echo "It looks like you're trying to start a privileged NFS server, but"\
      "\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server."
  fi
fi

Then restart NFS.

Reference RHEL7 kernel issue: https://access.redhat.com/errata/RHBA-2017:1674

.

avatar
Contributor

Thanks alot, increasing the stack size as suggested for nfs gateway helped. Thanks again, you have resolved all my issues today 🙂

avatar
Master Mentor

@Chiranjeevi Nimmala

Good to hear that your issue is resolved. It will be also wonderful if you can mark this thread as "Answered" (Accepted) so that it will be useful for other HCC users to quickly browse the correct answer for their issues.