Support Questions

Find answers, ask questions, and share your expertise

Unable to start YARN resource manager service in Apache Ambari

Background : We have a 3 node cluster namely nodeone, nodetwo, nodethree. We have successfully installed Ambari and services like HDFS, ZooKeeper, Ambari Metrics, SmartSense, Kafka. Later, we tried to add Yarn service which comes along with MapReduce ie., Yarn+MapReduce2. Services are added successfully, however we are facing problem in starting Yarn Services like ResourceManager, YARN Registry DNS, Timeline Service V2.0 Reader ie.,

We added

Services already installed and working fine are :

HDFS, ZooKeeper, Ambari Metrics, SmartSense, Kafka.

Services failed to start:

Resource Manager_Yarn.txt

YARN Registry DNS / YARN

Timeline Service V2.0 Reader / YARN

I understand Namenode should be running prior to starting YARN and it is running as expected.

As per my investigation of the logs, I am getting an error saying 'Your request could not be processed because an error occurred contacting the DNS server'. Could you please suggest on how to solve this issue?

Could you please help me in starting YARN services? I will attach the logs for Resource Manager and Timeline Service V2.0 Reader. Please do let me know if any other information is required.

Thanks and regards,

Spandan

10 REPLIES 10

Mentor

@Spandan Mohanty

YARN Registry DNS is a new component introduced in HDP 3.0 and usually runs on port 53. The most probable issue is the port is already in use by another process to check that follow the below steps

Verify if the port 53 is available on the YARN host.

# nc -l `hostname -f` 53

Ncat: bind to x.x.x.x:53: Address already in use. QUITTING.

Else change the value of hadoop.registry.dns.bind-port and restart th registrydns

Start the DNS Server

yarn --daemon start registrydns

Please revert

Hi @Geoffrey Shelton Okot, thanks for your response.

Highly appreciate it! I had seen this same solution in one of the posts from you, and hence I had solved the issue with RegistryDNS ie., I was able to start RegistryDNS service. However, I am still not able to start other services like - Yarn resource Manager and Timeline Service V2.0 Reader / YARN. Could you please help me resolving this issue?


Thanks and regards,

Spandan Mohanty

Mentor

@Spandan Mohanty

Can you share the Timeline Service V2.0 Reader / YARN logs?

Also attaching the logs for Yarn Resource manager

Resource Manager_Yarn.txt

The errors for all the services are same - "Your request could not be processed because an error occurred contacting the DNS server. The DNS server may be temporarily unavailable, or there could be a network problem."

Super Mentor

@Spandan Mohanty

The error indicates that your HDFS services are not healthy. Please check your NameNode logs. (/var/log/hadoop/hdfs/hadoop-hdfs-namenode-xxx.log)

Because even the following simple webhdfs call is failing with 503 error.

# curl -sS -L -w '%{http_code}' -X GET -d '' -H 'Content-Length: 0' 'http://nodetwo:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs'' returned status_code=503. 


The cause of the error is logged as well here as following:

The DNS server may be temporarily unavailable, or there could be a network problem.


Next Action:

1. Please check if your HDFS services are up and running and there are no errors in NameNode logs.

2. Before attempting to start your "ResourceManagers" you should first check if the DNS name resolution is working fine and all your cluster nodes are able to resolve each other using their FQDN?


Once your DNS issue is fixed and all the cluster nodes are able to resolve each other properly then restart the affected components and then try to validate if you are able to make the Simple WebHDFS call to name node or not ? Then only you should attempt to restart ResourceManager.

# curl -sS -L -X GET 'http://nodetwo:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs'

.


Thanks for your response @Jay Kumar SenSharma. When I started namenode from Ambari UI, it says 'started'; however, when I checked the logs, there are few errors. Looking into the issue/errors now. Meanwhile, I will attach the namenode log file (as seen on Ambari UI), just in case anybody can help me figure it out.Task_log_error.txt

Mentor

@Spandan Mohanty

The below are the errors you are encountering while starting the HDFS/YARN

2019-06-28 14:58:44,564 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://nodetwo:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From nodetwo/172.16.217.206 to nodetwo:8020 failed on connection exception: java.net.ConnectException: Connection refused;

Network Error (dns_server_failure) Your request could not be processed because an error occurred contacting the DNS server.The DNS server may be temporarily unavailable, or there could be a network problem.

Please do the following while logged on as hdfs assuming you are the root user


# su - hdfs
$ hdfs dfsadmin -safemode get

The above should confirm the namenode îs in safe mode

$ hdfs dfsadmin -safemode leave


validate that safe mode is off

$ hdfs dfsadmin -safemode get

Then restart the hdfs /YARN from Ambar that should resolve the issue


Super Mentor

@Spandan Mohanty

Based on the recent error that you shared as "task-log-error.txt" we see the following failure:

Access Denied (authentication_failed)

Your credentials could not be authenticated: "General authentication failure due to bad user ID or authentication token.". You will not be permitted access until your credentials can be verified.

This is typically caused by an incorrect username and/or password, but could also be caused by network problems.


Possible Cause:

This error indicates that there is some Network Proxy added to your Cluster and all your requests to nameNode ... and (may be other components) are going via Proxy Server and your proxy server is configured for authentication.


So you will need to find the Proxy Settings added to any of the following places on your cluster hosts (including NameNode host and Ambari Server host)


Possible Identify:

You can find out if the requests are passing via Proxy or not if you try to run the same CRUL commands manually with "-iLv" options.

# curl -iLv -X GET 'http://nodetwo:50070/webhdfs/v1/tmp?op=GETFILESTATUS&user.name=hdfs'


Possible Remedy:

Please search for "http_proxy" or "proxy" setting defined in your cluster nodes specially int he following files. (for "root" user and for "hdfs" user as well) and then remove them if needed or define the "no_proxy"

As "root" user and "hdfs" or other users.

# cat /etc/environment | grep proxy
# cat ~/.bash_profile | grep proxy
# cat /etc/profile | grep proxy

You can disable the proxy setting for internal domain communication using "export no_proxy" option.

To know more about please refer to : https://www.shellhacks.com/linux-proxy-server-settings-set-proxy-command-line/



Mentor

@Spandan Mohanty

Both log files indicate that the DNS is the problem could you verify that the DNS is running!

start-timeline-service-v2-0-reader.txt

The DNS server may be temporarily unavailable, or there could be a network problem.

resource-manager-yarn.txt

Your request could not be processed because an error occurred contacting the DNS server.

Please revert