Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS connection error when reading in pyspark

avatar

Hi,

@Jay Kumar SenSharma

I'm getting following error when trying to access a file on HDFS. I am able to ping "node1.mydomain" from other machine where this pyspark script is running.

File "/opt/<mysoftware>/depLibs/usr/local/lib/python2.7/site-packages/hdfs/client.py", line 44, in _on_error
    raise HdfsError(message)
HdfsError: <HTML><HEAD>
<TITLE>Network Error</TITLE>
</HEAD>
<BODY>
<FONT face="Helvetica">
<big><strong></strong></big><BR>
</FONT>
<blockquote>
<TABLE border=0 cellPadding=1 width="80%">
<TR><TD>
<FONT face="Helvetica">
<big>Network Error (dns_unresolved_hostname)</big>
<BR>
<BR>
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica">
Your requested host "node1.mydomain" could not be resolved by DNS.
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica">

</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica" SIZE=2>
<BR>
For assistance, contact your network support team.
</FONT>
</TD></TR>
</TABLE>
</blockquote>
</FONT>
</BODY></HTML>
1 ACCEPTED SOLUTION

avatar

@Jay Kumar SenSharma Thanks for your help. It turned out that I had to remove proxy configurations and restart all services to get that to effect rather than just HDFS service restart. Post that, the hdfs read operation started working 🙂

View solution in original post

6 REPLIES 6

avatar

@Jay Kumar SenSharma request your help on this issue

avatar
Master Mentor

@K D

When Spark Job Runs then it might be executing on various cluster hosts (nodes) so you will need to make sure that the DNS entry (or the "/etc/hosts") file entry is configured properly for all the Hosts present inside the cluster to resolve the "node1.mydomain"

.

So please check if you have correct "/etc/hosts" file entry in all the cluster ndoes to resolved the "node1.mydomain" as following: (I am assuming that 10.10.10.10 is an example IP address for your node1.mydomain) Please replace the IP Address with the actual IP address.

10.10.10.10  node1.mydomain

.


.

avatar

@Jay Kumar SenSharma Thanks for your reply. Yes I have entries in /etc/hosts. All of this was working before I restarted all services of HortonWorks

avatar
Master Mentor

@K D

Looks like in your Network the Hostname recognization is happening via DNS Server instead of "etc/hosts" file.

If it is failing for any particular Host then another possible cause any be that any specific node of your cluster might have some environmental difference between that node compared to the rest of the cluster that might be causing DNS resolution to not work properly.


The Network infrastructure team might help in resolving the DNS issues. As i suspect that it might be related to the DNS settings.

avatar

@Jay Kumar SenSharma Thanks for your help. It turned out that I had to remove proxy configurations and restart all services to get that to effect rather than just HDFS service restart. Post that, the hdfs read operation started working 🙂

avatar
Master Mentor
@K D

Wow!! good to share the findings. Yes, the explicit proxy setting will force the cluster ndoes to pass their requests via Network Proxy and which might not be aware of the "node1.mydomain". So good to remove the proxy settings.

It will be also great to close this thread by clicking on the "Accept" button so that other HCC users can quickly browse the solution when they encounter the same issue.