Support Questions

Find answers, ask questions, and share your expertise

Trouble starting NFS Gateway (cannot access /tmp/.hdfs-nfs)

I have the HDP 2.5 sandbox installed on an Azure VM. Right now, I can't get hive or hdfs to function and see that my NFSGateways were not running. They won't restart.

I get these errors. I look and see that there is no /tmp/.hdfs-nfs file or directory in existence.

stderr: /var/lib/ambari-agent/data/errors-241.txt

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/", line 147, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/", line 280, in execute
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/", line 720, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/", line 57, in start
  File "/var/lib/ambari-agent/cache/common-services/HDFS/", line 71, in configure
  File "/var/lib/ambari-agent/cache/common-services/HDFS/", line 66, in nfsgateway
    group = params.user_group,
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 155, in __init__
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 124, in run_action
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/", line 191, in action_create
    sudo.makedir(path, self.resource.mode or 0755)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 90, in makedir
OSError: [Errno 22] Invalid argument: '/tmp/.hdfs-nfs'

If there are specific ports to expose or config files to edit, can you please specify under which user I should modify or run commands. Any direction on how to recover hdfs would be helpful. The hive service checked passed. In ambari, the hdfs data nodes look to be up and running too.

Thanks. - Colin


@Colin Cunningham

Can you provide more information on what you mean by "I can't get hive or hdfs to function". The NFSGateway is not needed for either Hive or HDFS to function properly. The NFSGateway is used only to mount the HDFS file system via NFS for other servers or systems outside of Hadoop.

@Michael Young Sure. I was trying to run a benchmark, ran into issues, so stepped back and tried to run the hive tutorial-100.

For that, I go to the Files View in Ambari. As admin or maria_dev, I saw this.


in the stack trace, I see

  Service 'hdfs' check failed:
org.apache.ambari.view.utils.hdfs.HdfsApiException: HDFS040 Couldn't open connection to HDFS

Is this enough to help in the diagnosis? I then went to the HDFS UI and tried to restart and that failed. I noted there the NFS gatesways were failing to start.

My only idea is to start over w/ a fresh build but that doesn't guarantee anything. thanks. -Colin

@Colin Cunningham

Can you provide the output from the Ambari tasks which attempts to restart HDFS? It looks like this:


Just expand the tasks that fail and copy/paste the errors that you see. The NFSGateway is probably failing to start because HDFS itself is not coming up properly. The logging information will also be in the HDFS logs located in /var/log/hadoop/hdfs on the Sandbox. You can look there as well.

@Michael Young. I believe I once saw the same order of tasks beforea and ti was the Restart NFSGateway that failed. I'm not sure how to access the view you see. Here is a different screenshot in case it is helpful.


@Michael Young restarthdfs.pngrestarthdfs.png OK. I attached the queue of stuff that HDFS said we needed to restart 9 components. It is a longer list than yours.

- Colin

It looks like you may be doing a restart of all components. Have you tried just restarting HDFS? I have attached two screenshots. The red arrows show how to get to the option:



@Colin Cunningham

When a component says it needs to "restart components", that is related to configuration changes in Ambari. If you perform a "restart of affected components", the components won't always restart if they are in a stopped state.

I find doing a focused restart of the component after doing a restart of all affected components helps.

Removing the '-' from .hdfs-nfs in the HDFS Advanced config under the NFSGateway section on the "NFSGateway dump directory" parameter fixed it.

This is what it looks like.nfsgateway.png

Oddly, I didn't need do this on the HDP sandbox installed on my Linux desktop. Only that on the Azure VM gave me this fit.