<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78793#M54997</link>
    <description>&lt;P&gt;It is a single node cluster, NN is the DN.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Also, why is it able to list the directoty contents but cannot seem to read/write from it?&lt;/P&gt;</description>
    <pubDate>Wed, 22 Aug 2018 13:36:23 GMT</pubDate>
    <dc:creator>AKB</dc:creator>
    <dc:date>2018-08-22T13:36:23Z</dc:date>
    <item>
      <title>Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78755#M54995</link>
      <description>&lt;P&gt;CDH 5.15 Single Node custer installed using CM on CentOS 7.x on AWS Ec2 instance.&amp;nbsp; 8 CPU, 64 GB RAM.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Verfied WebHDFS is running and I am connecting from a remote machine (non-hadoop client), after being connected to the environment using SSH key.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;using PyWebHDFSClient library to list, read and write files off HDFS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The following code works -&lt;/P&gt;
&lt;PRE&gt;hdfs = PyWebHdfsClient(&lt;SPAN&gt;host&lt;/SPAN&gt;=&lt;SPAN&gt;'IP_ADDR'&lt;/SPAN&gt;, &lt;SPAN&gt;port&lt;/SPAN&gt;=&lt;SPAN&gt;'50070'&lt;/SPAN&gt;, &lt;SPAN&gt;user_name&lt;/SPAN&gt;=&lt;SPAN&gt;'hdfs'&lt;/SPAN&gt;, &lt;SPAN&gt;timeout&lt;/SPAN&gt;=&lt;SPAN&gt;1&lt;/SPAN&gt;)  &lt;SPAN&gt;# your Namenode IP &amp;amp; username here&lt;BR /&gt;&lt;/SPAN&gt;my_dir = &lt;SPAN&gt;'ds-datalake/misc'&lt;BR /&gt;&lt;/SPAN&gt;pprint(hdfs.list_dir(my_dir))&lt;/PRE&gt;
&lt;P&gt;{u'FileStatuses': {u'FileStatus': [{u'accessTime': 1534856157369L,&lt;BR /&gt;u'blockSize': 134217728,&lt;BR /&gt;u'childrenNum': 0,&lt;BR /&gt;u'fileId': 25173,&lt;BR /&gt;u'group': u'supergroup',&lt;BR /&gt;u'length': 28,&lt;BR /&gt;u'modificationTime': 1534856157544L,&lt;BR /&gt;u'owner': u'centos',&lt;BR /&gt;u'pathSuffix': u'sample.txt',&lt;BR /&gt;u'permission': u'644',&lt;BR /&gt;u'replication': 3,&lt;BR /&gt;u'storagePolicy': 0,&lt;BR /&gt;u'type': u'FILE'}]}}&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But, when I try to read/write at same location, using something like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;my_file = &lt;SPAN&gt;'ds-datalake/misc/sample.txt'&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;(hdfs.read_file(my_file))&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;I get the following error:&lt;/P&gt;
&lt;P&gt;requests.exceptions.ConnectionError: HTTPConnectionPool(host='HOST_NAME', port=50075): Max retries exceeded with url: /webhdfs/v1/ds-datalake/misc/sample.txt?op=OPEN&amp;amp;user.name=hdfs&amp;amp;namenoderpcaddress=HOST_NAME:8020&amp;amp;offset=0 (Caused by NewConnectionError('&amp;lt;urllib3.connection.HTTPConnection object at 0x00000000068F4828&amp;gt;: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is what the HDFS folder looks like:&lt;/P&gt;
&lt;P&gt;hadoop fs -ls /ds-datalake/misc&lt;BR /&gt;Found 1 items&lt;BR /&gt;-rwxrwxrwx 3 centos supergroup 28 2018-08-21 12:55 /ds-datalake/misc/sample.txt&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you please help me? I have two single node test clusters and this happens on both. HDFS Namenode UI comes up fine from the CM site and all services look healthy.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Aug 2018 18:00:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78755#M54995</guid>
      <dc:creator>AKB</dc:creator>
      <dc:date>2018-08-21T18:00:05Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78781#M54996</link>
      <description>It appears as though your remote (client) machine has network access and/or&lt;BR /&gt;DNS resolution only for the NameNode host, but not to the DataNode hosts.&lt;BR /&gt;&lt;BR /&gt;When using the WebHDFS protocol at the NameNode, a CREATE call or a READ&lt;BR /&gt;call will typically result in the NameNode sending back a 30x (307&lt;BR /&gt;typically) code to redirect your client to a chosen target DataNode service&lt;BR /&gt;that will handle the rest of the data-oriented work. The NameNode only&lt;BR /&gt;handles metadata requests, and does not desire to be burdened with actual&lt;BR /&gt;data streaming overheads so it redirects the clients to one of the 'worker'&lt;BR /&gt;WebHDFS servlet hosts (i.e. DataNodes).&lt;BR /&gt;&lt;BR /&gt;This is documented at&lt;BR /&gt;&lt;A href="http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/WebHDFS.html" target="_blank"&gt;http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/WebHDFS.html&lt;/A&gt;&lt;BR /&gt;and&lt;BR /&gt;you should be able to verify this in your error - the HOST_NAME that you've&lt;BR /&gt;masked away for port 50075 is a DataNode service host/port.&lt;BR /&gt;&lt;BR /&gt;Ensure your client can connect to and name-resolve all DataNode&lt;BR /&gt;hostnames/port besides just the NameNode for the WebHDFS client to work.&lt;BR /&gt;&lt;BR /&gt;If you need a more one-stop-gateway solution, run a HTTPFS service and&lt;BR /&gt;point your client code to just that web host:port, instead of using the&lt;BR /&gt;NameNode web address. The HTTPFS service's WebHDFS API will not require&lt;BR /&gt;redirection, as it would act as a 'proxy' and handle all calls for you from&lt;BR /&gt;one location.&lt;BR /&gt;</description>
      <pubDate>Wed, 22 Aug 2018 11:12:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78781#M54996</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2018-08-22T11:12:55Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78793#M54997</link>
      <description>&lt;P&gt;It is a single node cluster, NN is the DN.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Also, why is it able to list the directoty contents but cannot seem to read/write from it?&lt;/P&gt;</description>
      <pubDate>Wed, 22 Aug 2018 13:36:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78793#M54997</guid>
      <dc:creator>AKB</dc:creator>
      <dc:date>2018-08-22T13:36:23Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78794#M54998</link>
      <description>Yes, but is your client able to (a) resolve the hostname of the DN/NN (you&lt;BR /&gt;seem to be using an IP in your code) and (b) does it have permission&lt;BR /&gt;(firewall, etc.) to connect to the DN web port?&lt;BR /&gt;</description>
      <pubDate>Wed, 22 Aug 2018 13:37:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78794#M54998</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2018-08-22T13:37:55Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78795#M54999</link>
      <description>&lt;P&gt;Another thing came to mind. I am using Elastic IP for the public IP address which is what I put in the code. It does resolve to the private IP as I can see in the error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-172-31-26-58.ec2.internal', port=50075): Max retries exceeded with url: /webhdfs/v1/tmp/sample.txt?op=OPEN&amp;amp;user.name=hdfs&amp;amp;namenoderpcaddress=ip-172-31-26-58.ec2.internal:8020&amp;amp;offset=0 (Caused by NewConnectionError('&amp;lt;urllib3.connection.HTTPConnection object at 0x0000000007693828&amp;gt;: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Security group is also configured to allow entry for these ports from my work IP address range.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Aug 2018 13:41:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78795#M54999</guid>
      <dc:creator>AKB</dc:creator>
      <dc:date>2018-08-22T13:41:06Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78806#M55000</link>
      <description>&lt;P&gt;I found this which is somewhat relevant, I think -&amp;nbsp;&lt;A href="https://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-running-in-ec2-using-public-ip-addresses/" target="_blank"&gt;https://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-running-in-ec2-using-public-ip-addresses/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But, my problem is, I am trying to connect from a remote non-hadoop edge node machine, so there is no hadoop config files here.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Aug 2018 14:03:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78806#M55000</guid>
      <dc:creator>AKB</dc:creator>
      <dc:date>2018-08-22T14:03:50Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78882#M55001</link>
      <description>&lt;P&gt;Solution found.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the hosts file of the python client machine,&lt;/P&gt;&lt;P&gt;add public IP and private host name&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is appropriate for a cloud service like AWS.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Python lib works fine now.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/4054"&gt;@bgooley&lt;/a&gt;&amp;nbsp;help on another thread that resolved this too.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 Aug 2018 13:27:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/78882#M55001</guid>
      <dc:creator>AKB</dc:creator>
      <dc:date>2018-08-23T13:27:25Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to access HDFS Namenode from Python library - Max retries exceeded with url</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/83812#M55002</link>
      <description>&lt;P&gt;I am having the same problem. Can you please expain about 'hosts file' and how can i add IP and hostname? Are we still using IP and hostname of Namenode?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Dec 2018 06:20:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-access-HDFS-Namenode-from-Python-library-Max/m-p/83812#M55002</guid>
      <dc:creator>Raghav_35v6</dc:creator>
      <dc:date>2018-12-13T06:20:38Z</dc:date>
    </item>
  </channel>
</rss>

