Created on 02-22-2017 08:11 PM - edited 09-16-2022 04:08 AM
I have the HDP 2.5 sandbox installed on an Azure VM. Right now, I can't get hive or hdfs to function and see that my NFSGateways were not running. They won't restart.
I get these errors. I look and see that there is no /tmp/.hdfs-nfs file or directory in existence.
stderr: /var/lib/ambari-agent/data/errors-241.txt
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/nfsgateway.py", line 147, in <module> NFSGateway().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 720, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/nfsgateway.py", line 57, in start self.configure(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/nfsgateway.py", line 71, in configure nfsgateway(action="configure") File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_nfsgateway.py", line 66, in nfsgateway group = params.user_group, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 191, in action_create sudo.makedir(path, self.resource.mode or 0755) File "/usr/lib/python2.6/site-packages/resource_management/core/sudo.py", line 90, in makedir os.mkdir(path) OSError: [Errno 22] Invalid argument: '/tmp/.hdfs-nfs'
If there are specific ports to expose or config files to edit, can you please specify under which user I should modify or run commands. Any direction on how to recover hdfs would be helpful. The hive service checked passed. In ambari, the hdfs data nodes look to be up and running too.
Thanks. - Colin
Created 02-22-2017 08:16 PM
Can you provide more information on what you mean by "I can't get hive or hdfs to function". The NFSGateway is not needed for either Hive or HDFS to function properly. The NFSGateway is used only to mount the HDFS file system via NFS for other servers or systems outside of Hadoop.
Created 02-22-2017 09:20 PM
@Michael Young Sure. I was trying to run a benchmark, ran into issues, so stepped back and tried to run the hive tutorial-100.
For that, I go to the Files View in Ambari. As admin or maria_dev, I saw this.
in the stack trace, I see
Service 'hdfs' check failed: org.apache.ambari.view.utils.hdfs.HdfsApiException: HDFS040 Couldn't open connection to HDFS ...
Is this enough to help in the diagnosis? I then went to the HDFS UI and tried to restart and that failed. I noted there the NFS gatesways were failing to start.
My only idea is to start over w/ a fresh build but that doesn't guarantee anything. thanks. -Colin
Created on 02-22-2017 11:45 PM - edited 08-19-2019 03:34 AM
Can you provide the output from the Ambari tasks which attempts to restart HDFS? It looks like this:
Just expand the tasks that fail and copy/paste the errors that you see. The NFSGateway is probably failing to start because HDFS itself is not coming up properly. The logging information will also be in the HDFS logs located in /var/log/hadoop/hdfs on the Sandbox. You can look there as well.
Created 02-23-2017 12:17 AM
@Michael Young. I believe I once saw the same order of tasks beforea and ti was the Restart NFSGateway that failed. I'm not sure how to access the view you see. Here is a different screenshot in case it is helpful.
Created 02-23-2017 12:29 AM
@Michael Young restarthdfs.pngrestarthdfs.png OK. I attached the queue of stuff that HDFS said we needed to restart 9 components. It is a longer list than yours.
- Colin
Created on 02-23-2017 12:44 AM - edited 08-19-2019 03:34 AM
It looks like you may be doing a restart of all components. Have you tried just restarting HDFS? I have attached two screenshots. The red arrows show how to get to the option:
Created 02-23-2017 01:04 AM
When a component says it needs to "restart components", that is related to configuration changes in Ambari. If you perform a "restart of affected components", the components won't always restart if they are in a stopped state.
I find doing a focused restart of the component after doing a restart of all affected components helps.
Created 02-24-2017 05:53 PM
Removing the '-' from .hdfs-nfs in the HDFS Advanced config under the NFSGateway section on the "NFSGateway dump directory" parameter fixed it.
This is what it looks like.nfsgateway.png
Oddly, I didn't need do this on the HDP sandbox installed on my Linux desktop. Only that on the Azure VM gave me this fit.