Created 06-24-2019 06:43 PM
Background : We have a 3 node cluster namely nodeone, nodetwo, nodethree. We have successfully installed Ambari and services like HDFS, ZooKeeper, Ambari Metrics, SmartSense, Kafka. Later, we tried to add Yarn service which comes along with MapReduce ie., Yarn+MapReduce2. Services are added successfully, however we are facing problem in starting Yarn Services like ResourceManager, YARN Registry DNS, Timeline Service V2.0 Reader ie.,
We added
Services already installed and working fine are :
HDFS, ZooKeeper, Ambari Metrics, SmartSense, Kafka.
Services failed to start:
Resource Manager_Yarn.txt
YARN Registry DNS / YARN
Timeline Service V2.0 Reader / YARN
I understand Namenode should be running prior to starting YARN and it is running as expected.
As per my investigation of the logs, I am getting an error saying 'Your request could not be processed because an error occurred contacting the DNS server'. Could you please suggest on how to solve this issue?
Could you please help me in starting YARN services? I will attach the logs for Resource Manager and Timeline Service V2.0 Reader. Please do let me know if any other information is required.
Thanks and regards,
Spandan
Created 06-24-2019 10:00 PM
YARN Registry DNS is a new component introduced in HDP 3.0 and usually runs on port 53. The most probable issue is the port is already in use by another process to check that follow the below steps
Verify if the port 53 is available on the YARN host.
# nc -l `hostname -f` 53
Ncat: bind to x.x.x.x:53: Address already in use. QUITTING.
Else change the value of hadoop.registry.dns.bind-port and restart th registrydns
Start the DNS Server
yarn --daemon start registrydns
Please revert
Created 06-28-2019 05:07 AM
Hi @Geoffrey Shelton Okot, thanks for your response.
Highly appreciate it! I had seen this same solution in one of the posts from you, and hence I had solved the issue with RegistryDNS ie., I was able to start RegistryDNS service. However, I am still not able to start other services like - Yarn resource Manager and Timeline Service V2.0 Reader / YARN. Could you please help me resolving this issue?
Thanks and regards,
Spandan Mohanty
Created 06-28-2019 06:26 AM
Can you share the Timeline Service V2.0 Reader / YARN logs?
Created 06-28-2019 07:38 AM
Created 06-28-2019 07:42 AM
Also attaching the logs for Yarn Resource manager
The errors for all the services are same - "Your request could not be processed because an error occurred contacting the DNS server. The DNS server may be temporarily unavailable, or there could be a network problem."
Created 06-28-2019 08:26 AM
The error indicates that your HDFS services are not healthy. Please check your NameNode logs. (/var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxx.log)
Because even the following simple webhdfs call is failing with 503 error.
# curl -sS -L -w '%{http_code}' -X GET -d '' -H 'Content-Length: 0' 'http://nodetwo:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs'' returned status_code=503.
The cause of the error is logged as well here as following:
The DNS server may be temporarily unavailable, or there could be a network problem.
Next Action:
1. Please check if your HDFS services are up and running and there are no errors in NameNode logs.
2. Before attempting to start your "ResourceManagers" you should first check if the DNS name resolution is working fine and all your cluster nodes are able to resolve each other using their FQDN?
Once your DNS issue is fixed and all the cluster nodes are able to resolve each other properly then restart the affected components and then try to validate if you are able to make the Simple WebHDFS call to name node or not ? Then only you should attempt to restart ResourceManager.
# curl -sS -L -X GET 'http://nodetwo:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs'
.
Created 06-28-2019 09:51 AM
Thanks for your response @Jay Kumar SenSharma. When I started namenode from Ambari UI, it says 'started'; however, when I checked the logs, there are few errors. Looking into the issue/errors now. Meanwhile, I will attach the namenode log file (as seen on Ambari UI), just in case anybody can help me figure it out.Task_log_error.txt
Created 06-28-2019 11:40 AM
The below are the errors you are encountering while starting the HDFS/YARN
2019-06-28 14:58:44,564 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nodetwo:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From nodetwo/172.16.217.206 to nodetwo:8020 failed on connection exception: java.net.ConnectException: Connection refused;
Network Error (dns_server_failure) Your request could not be processed because an error occurred contacting the DNS server.The DNS server may be temporarily unavailable, or there could be a network problem.
Please do the following while logged on as hdfs assuming you are the root user
# su - hdfs $ hdfs dfsadmin -safemode get
The above should confirm the namenode îs in safe mode
$ hdfs dfsadmin -safemode leave
validate that safe mode is off
$ hdfs dfsadmin -safemode get
Then restart the hdfs /YARN from Ambar that should resolve the issue
Created 06-29-2019 11:04 PM
Based on the recent error that you shared as "task-log-error.txt" we see the following failure:
Access Denied (authentication_failed) Your credentials could not be authenticated: "General authentication failure due to bad user ID or authentication token.". You will not be permitted access until your credentials can be verified. This is typically caused by an incorrect username and/or password, but could also be caused by network problems.
Possible Cause:
This error indicates that there is some Network Proxy added to your Cluster and all your requests to nameNode ... and (may be other components) are going via Proxy Server and your proxy server is configured for authentication.
So you will need to find the Proxy Settings added to any of the following places on your cluster hosts (including NameNode host and Ambari Server host)
Possible Identify:
You can find out if the requests are passing via Proxy or not if you try to run the same CRUL commands manually with "-iLv" options.
# curl -iLv -X GET 'http://nodetwo:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'
Possible Remedy:
Please search for "http_proxy" or "proxy" setting defined in your cluster nodes specially int he following files. (for "root" user and for "hdfs" user as well) and then remove them if needed or define the "no_proxy"
As "root" user and "hdfs" or other users.
# cat /etc/environment | grep proxy # cat ~/.bash_profile | grep proxy # cat /etc/profile | grep proxy
You can disable the proxy setting for internal domain communication using "export no_proxy" option.
To know more about please refer to : https://www.shellhacks.com/linux-proxy-server-settings-set-proxy-command-line/