Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

History server fails to start on a new HA HDP 2.3.4.7.4 cluster

avatar
Rising Star

HDP-2.3.4.7-4

Ambari Version 2.2.1.1

All services are up and running except for History server. Could not find any related errors in namenode or data node logs.

Following is the error reported by Ambari.

File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 191, in run_command raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT -T /usr/hdp/2.3.4.7-4/hadoop/mapreduce.tar.gz 'http://standbynamenode.sample.com:50070/webhdfs/v1/hdp/apps/2.3.4.7-4/mapreduce/mapreduce.tar.gz?op=CREATE&user.name=hdfs&overwrite=True&permission=444'' returned status_code=403. { "RemoteException": { "exception": "ConnectException", "javaClassName": "java.net.ConnectException", "message": "Call From datanode.sample.com/10.250.98.101 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused" } }

Status code: 403 indicates that the request is correct, but not probably authroized?

Any pointers will be helpful.

Thanks,

1 ACCEPTED SOLUTION

avatar
Rising Star

Got it!

fs.defaultFS - This is in core-site.xml.

The value should be set to hdfs://namespaceid (where namespace id is the namespace that has been defined in the cluster). It works

View solution in original post

8 REPLIES 8

avatar
Super Guru
@Mohana Murali Gurunathan

I can see that while starting it is trying to write /usr/hdp/2.3.4.7-4/hadoop/mapreduce.tar.gz at /hdp/apps/2.3.4.7-4/mapreduce/mapreduce.tar.gz on hdfs. it's unable to write to HDFS because of connection refused error.

If you look at the logs carefully, you can see that instead of namenode hostname, datanode is trying to connect to localhost:8020 which is failing as expected.

exception": "ConnectException", "javaClassName": "java.net.ConnectException", "message": "Call Fromdatanode.sample.com/10.250.98.101 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

Can you please check /etc/hosts file on all the datanodes just to ensure that you have added correct entries for the namennode?

avatar
Rising Star

@Kuldeep - Yes, the /etc/hosts file on all the nodes (including data nodes) have the right details for namenode and other nodes in the cluster. True, it is really not clear, why datanode is trying to connect to 8020 in the localhost. It should have contacted the namenode. This is a fresh cluster created and no operations have started yet.

avatar
Rising Star

@Kuldeep - tried some hadoop operations like ls or put

every command is failing as each of the requests is connecting to localhost:8020 rather than any of the namenode or standby name node. Checked the configs involvng 8020. see the attached file

8020.jpg

avatar
Super Guru

@Mohana Murali Gurunathan - Please remove localhost and add hostname of your namenode in the configuration for fs.defaultFS.

current value - localhost:8020

recommended value - <hostname-of-namenode>:8020

avatar
Rising Star

Thanks Kuldeep. for your inputs. Finally found the reason - the value should be the namespace that we have chosen for the cluster - reason - the cluster I was trying is a HA cluster. So, if we put a specific host name, we will be in trouble, if the host is not available (if it is down). By keeping the namespace, things are better. Thanks for your inputs.

avatar
Rising Star

please I have the same problem but I don't understand your reply.could you please explain

avatar
Rising Star

Got it!

fs.defaultFS - This is in core-site.xml.

The value should be set to hdfs://namespaceid (where namespace id is the namespace that has been defined in the cluster). It works

avatar
Rising Star

Pls. note the fact that the namepsaceid referred here is not the one you find in the file /hadoop/hdfs/namenode/current/VERSION. But, it is the value of the following property - dfs.nameservices