<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: pyspark permission errors in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138128#M35313</link>
    <description>&lt;P&gt;Someone had entered two entries in the spark-defaults.conf which caused spark shell and pyspark to run as "spark" in yarn. spark.yarn.keytab and spark.yarn.principal. &lt;/P&gt;&lt;P&gt;Removing them fixed it.&lt;/P&gt;</description>
    <pubDate>Fri, 29 Jul 2016 02:55:20 GMT</pubDate>
    <dc:creator>james_jones</dc:creator>
    <dc:date>2016-07-29T02:55:20Z</dc:date>
    <item>
      <title>pyspark permission errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138125#M35310</link>
      <description>&lt;P&gt;How can we get pyspark to submit yarn jobs as the end user? We have data in a private directory (700) that a user owns. He can select data with HiveServer2's beeline, but when using pyspark, he gets permission denied because the job is submitted as the "spark" user instead of as the end-user. This is a kerberized cluster with Ranger Hive and HDFS plugins. He has access to the directory in question, just not with pyspark.&lt;/P&gt;&lt;P&gt;He is mostly using Jupyter via Jupyterhub, which is using PAM authentication, but I think he has also run this with bin/pyspark with the same results.&lt;/P&gt;&lt;P&gt;Here is the code:&lt;/P&gt;&lt;PRE&gt;from pyspark import SparkContext, SparkConf
SparkContext.setSystemProperty('spark.executor.memory', '2g')
conf = SparkConf()
conf.set('spark.executor.instances', 4)
sc = SparkContext('yarn-client', 'myapp', conf=conf)
rdd = sc.textFile('/user/johndoe/.staging/test/student.txt')
rdd.cache()
rdd.count()&lt;/PRE&gt;&lt;P&gt;And the error:&lt;/P&gt;&lt;PRE&gt;Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.security.AccessControlException: Permission denied: user=spark, access=EXECUTE, inode="/user/johndoe/.staging/test/student.txt":johndoe:hdfs:drwx------
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
        at org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer$RangerAccessControlEnforcer.checkPermission(RangerHdfsAuthorizer.java:305)&lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Jul 2016 22:53:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138125#M35310</guid>
      <dc:creator>james_jones</dc:creator>
      <dc:date>2016-07-20T22:53:05Z</dc:date>
    </item>
    <item>
      <title>Re: pyspark permission errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138126#M35311</link>
      <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/3076/bmathew.html" nodeid="3076"&gt;@Binu Mathew&lt;/A&gt; any ideas. &lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2016 22:59:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138126#M35311</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-07-20T22:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: pyspark permission errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138127#M35312</link>
      <description>&lt;P&gt;Since you are using Jupyter with Spark, you might consider looking at Livy. Livy is an open source REST server for Spark. When you execute a code cell in a PySpark notebook, it creates a Livy session to execute your code. Livy allows multiple users to share the same Spark server through "impersonation support". This should hopefully allow you to access objects using your logged in username. The link below documents the REST commands you can use (for instance, you can use the&lt;CODE&gt;%%info&lt;/CODE&gt; magic to display the current Livy session information):&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/cloudera/livy/tree/6fe1e80cfc72327c28107e0de20c818c1f13e027#post-sessions"&gt;https://github.com/cloudera/livy/tree/6fe1e80cfc72327c28107e0de20c818c1f13e027#post-sessions&lt;/A&gt;&lt;/P&gt;&lt;UL&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 21 Jul 2016 05:43:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138127#M35312</guid>
      <dc:creator>phargis</dc:creator>
      <dc:date>2016-07-21T05:43:44Z</dc:date>
    </item>
    <item>
      <title>Re: pyspark permission errors</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138128#M35313</link>
      <description>&lt;P&gt;Someone had entered two entries in the spark-defaults.conf which caused spark shell and pyspark to run as "spark" in yarn. spark.yarn.keytab and spark.yarn.principal. &lt;/P&gt;&lt;P&gt;Removing them fixed it.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Jul 2016 02:55:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/pyspark-permission-errors/m-p/138128#M35313</guid>
      <dc:creator>james_jones</dc:creator>
      <dc:date>2016-07-29T02:55:20Z</dc:date>
    </item>
  </channel>
</rss>

