Created 07-24-2019 01:26 PM
Attempting to use / startup HDFS NFS following the docs (ignoring the instructions to stop the rpcbind
service and did not start the hadoop portmap
service given that the OS is not SLES 11 and RHEL 6.2), but running into error when trying to set up the NFS service starting the hdfs nfs3
service:
[root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# service nfs status Redirecting to /bin/systemctl status nfs.service Unit nfs.service could not be found. [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# service nfs stop Redirecting to /bin/systemctl stop nfs.service Failed to stop nfs.service: Unit nfs.service not loaded. [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# service rpcbind status Redirecting to /bin/systemctl status rpcbind.service ● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-07-23 13:48:54 HST; 28s ago Process: 27337 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS) Main PID: 27338 (rpcbind) CGroup: /system.slice/rpcbind.service └─27338 /sbin/rpcbind -w Jul 23 13:48:54 HW02.ucera.local systemd[1]: Starting RPC bind service... Jul 23 13:48:54 HW02.ucera.local systemd[1]: Started RPC bind service. [root@HW02 ~]# [root@HW02 ~]# [root@HW02 ~]# hdfs nfs3 19/07/23 13:49:33 INFO nfs3.Nfs3Base: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting Nfs3 STARTUP_MSG: host = HW02.ucera.local/172.18.4.47 STARTUP_MSG: args = [] STARTUP_MSG: version = 3.1.1.3.1.0.0-78 STARTUP_MSG: classpath = /usr/hdp/3.1.0.0-78/hadoop/conf:/usr/hdp/3.1.0.0-78/hadoop/lib/jersey-server-1.19.jar:/usr/hdp/3.1.0.0-78/hadoop/lib/ranger-hdfs-plugin-shim-1.2.0.3.1.0.0-78.jar: ... <a bunch of other jars> ... STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r e4f82af51faec922b4804d0232a637422ec29e64; compiled by 'jenkins' on 2018-12-06T12:26Z STARTUP_MSG: java = 1.8.0_112 ************************************************************/ 19/07/23 13:49:33 INFO nfs3.Nfs3Base: registered UNIX signal handlers for [TERM, HUP, INT] 19/07/23 13:49:33 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 19/07/23 13:49:33 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 19/07/23 13:49:33 INFO impl.MetricsSystemImpl: Nfs3 metrics system started 19/07/23 13:49:33 INFO oncrpc.RpcProgram: Will accept client connections from unprivileged ports 19/07/23 13:49:33 INFO security.ShellBasedIdMapping: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 19/07/23 13:49:33 INFO nfs3.WriteManager: Stream timeout is 600000ms. 19/07/23 13:49:33 INFO nfs3.WriteManager: Maximum open streams is 256 19/07/23 13:49:33 INFO nfs3.OpenFileCtxCache: Maximum open streams is 256 19/07/23 13:49:34 INFO nfs3.DFSClientCache: Added export: / FileSystem URI: / with namenodeId: -1408097406 19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Configured HDFS superuser is 19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Delete current dump directory /tmp/.hdfs-nfs 19/07/23 13:49:34 INFO nfs3.RpcProgramNfs3: Create new dump directory /tmp/.hdfs-nfs 19/07/23 13:49:34 INFO nfs3.Nfs3Base: NFS server port set to: 2049 19/07/23 13:49:34 INFO oncrpc.RpcProgram: Will accept client connections from unprivileged ports 19/07/23 13:49:34 INFO mount.RpcProgramMountd: FS:hdfs adding export Path:/ with URI: hdfs://hw01.ucera.local:8020/ 19/07/23 13:49:34 INFO oncrpc.SimpleUdpServer: Started listening to UDP requests at port 4242 for Rpc program: mountd at localhost:4242 with workerCount 1 19/07/23 13:49:34 ERROR mount.MountdBase: Failed to start the TCP server. org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:89) at org.apache.hadoop.mount.MountdBase.startTCPServer(MountdBase.java:83) at org.apache.hadoop.mount.MountdBase.start(MountdBase.java:98) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startServiceInternal(Nfs3.java:56) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startService(Nfs3.java:69) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:79) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:391) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:315) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 19/07/23 13:49:34 INFO util.ExitUtil: Exiting with status 1: org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242 19/07/23 13:49:34 INFO nfs3.Nfs3Base: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down Nfs3 at HW02.ucera.local/172.18.4.47 ************************************************************/
Not sure how to interpret any of the errors seen here (and have not installed any packages like nfs-utils
, assuming that Ambari would have installed all needed packages when cluster was initially installed).
Any debugging suggestions or solutions for what to do about this?
** UPDATE: After looking at the error, I can see
Caused by: java.net.BindException: Address already in use
and looking into what is already using it, we see...
[root@HW02 ~]# netstat -ltnp | grep 4242 tcp 0 0 0.0.0.0:4242 0.0.0.0:* LISTEN 98067/jsvc.exec
Not sure what this is. Does this have any known HDP-related significance? Is this safe to delete?
Created 07-27-2019 12:33 AM
TLDR: nfs gateway service was already running (by default, apparently) and the service that I thought was blocking the hadoop nfs3
service (jsvc.exec
) from starting was (I'm assuming) part of that service already running.
What made me suspect this was that when shutting down the cluster, the service also stopped plus the fact that it was using the port I needed for nfs. The way that I confirmed this was just from following the verification steps in the docs and seeing that my output was similar to what should be expected.
[root@HW02 ~]# rpcinfo -p hw02
program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100005 1 udp 4242 mountd 100005 2 udp 4242 mountd 100005 3 udp 4242 mountd 100005 1 tcp 4242 mountd 100005 2 tcp 4242 mountd 100005 3 tcp 4242 mountd 100003 3 tcp 2049 nfs
[root@HW02 ~]# showmount -e hw02 Export list for hw02: / *
Another thing that could told me that the jsvc process was part of an already running hdfs nfs service would have been checking the process info...
[root@HW02 ~]# ps -feww | grep jsvc root 61106 59083 0 14:27 pts/2 00:00:00 grep --color=auto jsvc root 163179 1 0 12:14 ? 00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/hadoop-hdfs-root-nfs3-HW02.ucera.local.out -errfile /var/log/hadoop/root/privileged-root-nfs3-HW02.ucera.local.err -pidfile /var/run/hadoop/root/hadoop-hdfs-root-nfs3.pid -nodetach -user hdfs -cp /usr/hdp/3.1.0.0-78/hadoop/conf:
...
and seeing jsvc.exec -Dproc_nfs3 ...
to get the hint that jsvc
(which apparently is for running java apps on linux) was being used to run the very nfs3 service I was trying to start.
And for anyone else with this problem, note that I did not stop all the services that the docs want you to stop (since using centos7)
[root@HW01 /]# service nfs status
Redirecting to /bin/systemctl status nfs.service
● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled) Active: inactive (dead)
[root@HW01 /]# service rpcbind status
Redirecting to /bin/systemctl status rpcbind.service
● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-07-19 15:17:02 HST; 6 days ago Main PID: 2155 (rpcbind) CGroup: /system.slice/rpcbind.service └─2155 /sbin/rpcbind -w
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
Created 07-27-2019 12:33 AM
TLDR: nfs gateway service was already running (by default, apparently) and the service that I thought was blocking the hadoop nfs3
service (jsvc.exec
) from starting was (I'm assuming) part of that service already running.
What made me suspect this was that when shutting down the cluster, the service also stopped plus the fact that it was using the port I needed for nfs. The way that I confirmed this was just from following the verification steps in the docs and seeing that my output was similar to what should be expected.
[root@HW02 ~]# rpcinfo -p hw02
program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100005 1 udp 4242 mountd 100005 2 udp 4242 mountd 100005 3 udp 4242 mountd 100005 1 tcp 4242 mountd 100005 2 tcp 4242 mountd 100005 3 tcp 4242 mountd 100003 3 tcp 2049 nfs
[root@HW02 ~]# showmount -e hw02 Export list for hw02: / *
Another thing that could told me that the jsvc process was part of an already running hdfs nfs service would have been checking the process info...
[root@HW02 ~]# ps -feww | grep jsvc root 61106 59083 0 14:27 pts/2 00:00:00 grep --color=auto jsvc root 163179 1 0 12:14 ? 00:00:00 jsvc.exec -Dproc_nfs3 -outfile /var/log/hadoop/root/hadoop-hdfs-root-nfs3-HW02.ucera.local.out -errfile /var/log/hadoop/root/privileged-root-nfs3-HW02.ucera.local.err -pidfile /var/run/hadoop/root/hadoop-hdfs-root-nfs3.pid -nodetach -user hdfs -cp /usr/hdp/3.1.0.0-78/hadoop/conf:
...
and seeing jsvc.exec -Dproc_nfs3 ...
to get the hint that jsvc
(which apparently is for running java apps on linux) was being used to run the very nfs3 service I was trying to start.
And for anyone else with this problem, note that I did not stop all the services that the docs want you to stop (since using centos7)
[root@HW01 /]# service nfs status
Redirecting to /bin/systemctl status nfs.service
● nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled) Active: inactive (dead)
[root@HW01 /]# service rpcbind status
Redirecting to /bin/systemctl status rpcbind.service
● rpcbind.service - RPC bind service Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2019-07-19 15:17:02 HST; 6 days ago Main PID: 2155 (rpcbind) CGroup: /system.slice/rpcbind.service └─2155 /sbin/rpcbind -w
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.