Support Questions

rvillanueva · ‎07-24-2019

Attempting to use / startup HDFS NFS following the docs (ignoring the instructions to stop the rpcbind service and did not start the hadoop portmap service given that the OS is not SLES 11 and RHEL 6.2), but running into error when trying to set up the NFS service starting the hdfs nfs3service:

[root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# service nfs status Redirecting to /bin/systemctl status nfs.service Unit nfs.service could not be found. [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# service nfs stop Redirecting to /bin/systemctl stop nfs.service Failed to stop nfs.service: Unit nfs.service not loaded. [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# service rpcbind status Redirecting to /bin/systemctl status rpcbind.service ● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-07-23 13:48:54 HST; 28s ago Process: 27337 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS) Main PID: 27338 (rpcbind) CGroup: /system.slice/rpcbind.service └─27338 /sbin/rpcbind -w Jul 23 13:48:54 HW02.ucera.local systemd[1]: Starting RPC bind service... Jul 23 13:48:54 HW02.ucera.local systemd[1]: Started RPC bind service. [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# hdfs nfs3 19/07/23 13:49:33 INFO nfs3.Nfs3Base: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting Nfs3 STARTUP_MSG: host = HW02.ucera.local/172.18.4.47 STARTUP_MSG: args = [] STARTUP_MSG: version = 3.1.1.3.1.0.0-78 STARTUP_MSG: classpath = /usr/hdp/3.1.0.0-78/hadoop/conf:/usr/hdp/3.1.0.0-78/hadoop/lib/jersey-server-1.19.jar:/usr/hdp/3.1.0.0-78/hadoop/lib/ranger-hdfs-plugin-shim-1.2.0.3.1.0.0-78.jar: ... <a bunch of other jars> ... STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r e4f82af51faec922b4804d0232a637422ec29e64; compiled by 'jenkins' on 2018-12-06T12:26Z STARTUP_MSG: java = 1.8.0_112 ************************************************************/ 19/07/23 13:49:33 INFO nfs3.Nfs3Base: registered UNIX signal handlers for [TERM, HUP, INT] 19/07/23 13:49:33 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 19/07/23 13:49:33 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 19/07/23 13:49:33 INFO impl.MetricsSystemImpl: Nfs3 metrics system started 19/07/23 13:49:33 INFO oncrpc.RpcProgram: Will accept client connections from unprivileged ports 19/07/23 13:49:33 INFO security.ShellBasedIdMapping: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 19/07/23 13:49:33 INFO nfs3.WriteManager: Stream timeout is 600000ms. 19/07/23 13:49:33 INFO nfs3.WriteManager: Maximum open streams is 256 19/07/23 13:49:33 INFO nfs3.OpenFileCtxCache: Maximum open streams is 256 19/07/23 13:49:34 INFO nfs3.DFSClientCache: Added export: / FileSystem URI: / with namenodeId: -1408097406 19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Configured HDFS superuser is 19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Delete current dump directory /tmp/.hdfs-nfs 19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Create new dump directory /tmp/.hdfs-nfs 19/07/23 13:49:34 INFO nfs3.Nfs3Base: NFS server port set to: 2049 19/07/23 13:49:34 INFO oncrpc.RpcProgram: Will accept client connections from unprivileged ports 19/07/23 13:49:34 INFO mount.RpcProgramMountd: FS:hdfs adding export Path:/ with URI: hdfs://hw01.ucera.local:8020/ 19/07/23 13:49:34 INFO oncrpc.SimpleUdpServer: Started listening to UDP requests at port 4242 for Rpc program: mountd at localhost:4242 with workerCount 1 19/07/23 13:49:34 ERROR mount.MountdBase: Failed to start the TCP server. org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:89) at org.apache.hadoop.mount.MountdBase.startTCPServer(MountdBase.java:83) at org.apache.hadoop.mount.MountdBase.start(MountdBase.java:98) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startServiceInternal(Nfs3.java:56) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startService(Nfs3.java:69) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:79) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19/07/23 13:49:34 INFO util.ExitUtil: Exiting with status 1: org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242 19/07/23 13:49:34 INFO nfs3.Nfs3Base: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down Nfs3 at HW02.ucera.local/172.18.4.47 ************************************************************/

Not sure how to interpret any of the errors seen here (and have not installed any packages like nfs-utils, assuming that Ambari would have installed all needed packages when cluster was initially installed).

Any debugging suggestions or solutions for what to do about this?

** UPDATE: After looking at the error, I can see

Caused by: java.net.BindException: Address already in use

and looking into what is already using it, we see...

[root@HW02 ~]# netstat -ltnp | grep 4242 tcp 0 0 0.0.0.0:4242 0.0.0.0:* LISTEN 98067/jsvc.exec

Not sure what this is. Does this have any known HDP-related significance? Is this safe to delete?

rvillanueva · ‎07-27-2019

TLDR: nfs gateway service was already running (by default, apparently) and the service that I thought was blocking the hadoop nfs3 service (jsvc.exec) from starting was (I'm assuming) part of that service already running.

What made me suspect this was that when shutting down the cluster, the service also stopped plus the fact that it was using the port I needed for nfs. The way that I confirmed this was just from following the verification steps in the docs and seeing that my output was similar to what should be expected.

[root@HW02 ~]# rpcinfo -p hw02   
program vers proto   port  service    100000    4   tcp    111  portmapper    100000    3   tcp    111  portmapper    100000    2   tcp    111  portmapper    100000    4   udp    111  portmapper    100000    3   udp    111  portmapper    100000    2   udp    111  portmapper    100005    1   udp   4242  mountd    100005    2   udp   4242  mountd    100005    3   udp   4242  mountd    100005    1   tcp   4242  mountd    100005    2   tcp   4242  mountd    100005    3   tcp   4242  mountd    100003    3   tcp   2049  nfs

[root@HW02 ~]# showmount -e hw02
Export list for hw02:
/ *

Another thing that could told me that the jsvc process was part of an already running hdfs nfs service would have been checking the process info...

[root@HW02 ~]# ps -feww | grep jsvc root      61106  59083  0 14:27 pts/2    00:00:00 grep --color=auto jsvc root     163179      1  0 12:14 ?        00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/hadoop-hdfs-root-nfs3-HW02.ucera.local.out -errfile /var/log/hadoop/root/privileged-root-nfs3-HW02.ucera.local.err -pidfile /var/run/hadoop/root/hadoop-hdfs-root-nfs3.pid -nodetach -user hdfs -cp /usr/hdp/3.1.0.0-78/hadoop/conf:
...

and seeing jsvc.exec -Dproc_nfs3 ... to get the hint that jsvc (which apparently is for running java apps on linux) was being used to run the very nfs3 service I was trying to start.

And for anyone else with this problem, note that I did not stop all the services that the docs want you to stop (since using centos7)

[root@HW01 /]# service nfs status
Redirecting to /bin/systemctl status nfs.service
● nfs-server.service - NFS server and services   Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)   Active: inactive (dead)
[root@HW01 /]# service rpcbind status
Redirecting to /bin/systemctl status rpcbind.service
● rpcbind.service - RPC bind service   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)   Active: active (running) since Fri 2019-07-19 15:17:02 HST; 6 days ago Main PID: 2155 (rpcbind)   CGroup: /system.slice/rpcbind.service           └─2155 /sbin/rpcbind -w 
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

View solution in original post

rvillanueva · ‎07-27-2019