<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can not access Namenode in Pyspark but in Python it works? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147702#M32450</link>
    <description>&lt;P&gt;&lt;A href="https://github.com/crs4/pydoop/issues/218" target="_blank"&gt;https://github.com/crs4/pydoop/issues/218&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 04 Jul 2016 16:20:47 GMT</pubDate>
    <dc:creator>lott3</dc:creator>
    <dc:date>2016-07-04T16:20:47Z</dc:date>
    <item>
      <title>Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147697#M32445</link>
      <description>&lt;P&gt;I just want to use HDFS.open in my Pyspark shell but get the following Error:&lt;/P&gt;&lt;P&gt;Someone got an idea ? In Python it works I can use HDFS.Open function - In Pyspark I can not access the Namenode? I do not get why it works in Python but not in Pyspark?&lt;/P&gt;&lt;P&gt;Python 2.7 (Anaconda 4) Spark 1.6.0 Hadoop 2.4 (Installed with Ambari)&lt;/P&gt;&lt;P&gt;I also asked on Stackoverflow: &lt;A href="http://stackoverflow.com/questions/37925300/pydoop-hdfs-ioexeption"&gt;Stackoverflow-Python-Pydoop-Hdfs&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;16/06/20 16:11:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 

libra                                                                                                             ry for your platform... using builtin-java classes where applicable
hdfsBuilderConnect(forceNewInstance=0, nn=xipcc01, port=8020, kerbTicketCachePat                                                                                                             h=(NULL), userName=(NULL)) error:
java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:26                                                                                                             44)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651                                                                                                             )
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:268                                                                                                             7)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:160)
        at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:157)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma                                                                                                             tion.java:1709)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:157)
Traceback (most recent call last):
  File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;
  File "/home/cloud/anaconda2/lib/python2.7/site-packages/pydoop/hdfs/__init_                                                                                                             _.py", line 121, in open
    fs = hdfs(host, port, user)
  File "/home/cloud/anaconda2/lib/python2.7/site-packages/pydoop/hdfs/fs.py",                                                                                                              line 150, in __init__
    h, p, u, fs = _get_connection_info(host, port, user)
  File "/home/cloud/anaconda2/lib/python2.7/site-packages/pydoop/hdfs/fs.py",                                                                                                              line 64, in _get_connection_info
    fs = core_hdfs_fs(host, port, user)
  File "/home/cloud/anaconda2/lib/python2.7/site-packages/pydoop/hdfs/core/__                                                                                                             init__.py", line 57, in core_hdfs_fs
    return _CORE_MODULE.CoreHdfsFs(host, port, user)
RuntimeError: (255, 'Unknown error 255')&lt;/PRE&gt;</description>
      <pubDate>Mon, 20 Jun 2016 21:39:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147697#M32445</guid>
      <dc:creator>lott3</dc:creator>
      <dc:date>2016-06-20T21:39:28Z</dc:date>
    </item>
    <item>
      <title>Re: Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147698#M32446</link>
      <description>&lt;P&gt;Try adding spark assembly jar while running pyspark.&lt;/P&gt;&lt;P&gt;pyspark --jars  /usr/hdp/current/spark-client/lib/spark-assembly-&amp;lt;version&amp;gt;-hadoop2&amp;lt;version&amp;gt;.jar&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jun 2016 23:04:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147698#M32446</guid>
      <dc:creator>jyadav</dc:creator>
      <dc:date>2016-06-20T23:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147699#M32447</link>
      <description>&lt;P&gt;This is what I am executing with Pydoop in Jupyter:&lt;/P&gt;&lt;PRE&gt;file_X_train
= hdfs.open("/path../.csv")
import
pydoop.hdfs as hdfs&lt;/PRE&gt;</description>
      <pubDate>Tue, 21 Jun 2016 15:21:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147699#M32447</guid>
      <dc:creator>lott3</dc:creator>
      <dc:date>2016-06-21T15:21:40Z</dc:date>
    </item>
    <item>
      <title>Re: Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147700#M32448</link>
      <description>&lt;P&gt;This did not work (I also commented my question for further information)&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jun 2016 15:22:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147700#M32448</guid>
      <dc:creator>lott3</dc:creator>
      <dc:date>2016-06-21T15:22:44Z</dc:date>
    </item>
    <item>
      <title>Re: Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147701#M32449</link>
      <description>&lt;P&gt;&lt;A href="https://github.com/crs4/pydoop/issues/158" target="_blank"&gt;https://github.com/crs4/pydoop/issues/158&lt;/A&gt; this is the error I get - I use HDP 2.4 and Python 2.7 - This is why I am asking here...&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jun 2016 15:31:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147701#M32449</guid>
      <dc:creator>lott3</dc:creator>
      <dc:date>2016-06-21T15:31:59Z</dc:date>
    </item>
    <item>
      <title>Re: Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147702#M32450</link>
      <description>&lt;P&gt;&lt;A href="https://github.com/crs4/pydoop/issues/218" target="_blank"&gt;https://github.com/crs4/pydoop/issues/218&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2016 16:20:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147702#M32450</guid>
      <dc:creator>lott3</dc:creator>
      <dc:date>2016-07-04T16:20:47Z</dc:date>
    </item>
    <item>
      <title>Re: Can not access Namenode in Pyspark but in Python it works?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147703#M32451</link>
      <description>&lt;P&gt;Hi Lukas,&lt;/P&gt;&lt;P&gt;PySpark's Spark Context (sc) also has it's own methods to read data from HDFS, sc.textFile(...), sc.wholeTextFile(...), sc.binaryFile(...). Why don't you try using those to read data from HDFS and you also directly get an RDD for the data you read in? However if you use these methods of the SparkContext make sure to add your core-site.xml and hdfs-site.xml config files to the Spark conf dir; and by the way the Spark Conf Dir can be set using the environment variable to any desired location SPARK_CONF_DIR.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jul 2016 14:54:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-not-access-Namenode-in-Pyspark-but-in-Python-it-works/m-p/147703#M32451</guid>
      <dc:creator>m_a_vervuurt</dc:creator>
      <dc:date>2016-07-05T14:54:24Z</dc:date>
    </item>
  </channel>
</rss>

