<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Cannot find a saved DataFrame on disk in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185078#M77997</link>
    <description>&lt;P&gt;Yes, sure. Sorry, I was actually referring to "hdfs://eureambarimaster1.local.eurecat.org:8020/user/hdfs/test/df.parquet" &lt;/P&gt;&lt;P&gt;Let me test it.&lt;/P&gt;</description>
    <pubDate>Mon, 07 May 2018 20:30:00 GMT</pubDate>
    <dc:creator>liana_napalkova</dc:creator>
    <dc:date>2018-05-07T20:30:00Z</dc:date>
    <item>
      <title>Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185071#M77990</link>
      <description>&lt;P&gt;I want to save DataFrame on disk:&lt;/P&gt;&lt;PRE&gt;df.write.format("parquet").save("/home/centos/test/df.parquet")&lt;/PRE&gt;&lt;P&gt;I get the following error, which says that the user "centos" does not have write permissions:&lt;/P&gt;&lt;PRE&gt;18/05/07 09:18:08 ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user=centos, access=WRITE, inode="/home/centos/test/df.parquet/_temporary/0":hdfs:hdfs:drwxr-xr-x&lt;/PRE&gt;&lt;P&gt;This is how I run spark-submit command:&lt;/P&gt;&lt;PRE&gt;spark-submit  --master yarn  --deploy-mode cluster  --driver-memory 6g  --executor-cores 2  --num-executors 2  --executor-memory 4g  --class org.test.MyProcessor  mytest.jar&lt;/PRE&gt;</description>
      <pubDate>Mon, 07 May 2018 16:38:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185071#M77990</guid>
      <dc:creator>liana_napalkova</dc:creator>
      <dc:date>2018-05-07T16:38:38Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185072#M77991</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/70685/liananapalkova.html"&gt;&lt;EM&gt;@Liana Napalkova&lt;/EM&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;You are trying to save to a local Filesystem&lt;STRONG&gt; /home/centos/---/---/ &lt;/STRONG&gt;and&lt;STRONG&gt; &lt;/STRONG&gt;from the error stack above the &lt;STRONG&gt;user &lt;/STRONG&gt;and&lt;STRONG&gt; group &lt;/STRONG&gt;is&lt;STRONG&gt; hdfs:hdfs &lt;/STRONG&gt;The user centos doesn't have the correct permissions and ownership of this directory. This has nothing to do with your earlier &lt;STRONG&gt;hdfs &lt;/STRONG&gt;directory where you set the correct permissions&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Please do the following, while logged on the Linux CLI as centos &lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;centos@{host}$ id&lt;/PRE&gt;&lt;P&gt;&lt;EM&gt;This will give you the group to which centos belongs to be used in the change ownership syntax,so as the root user or sudoer where xxx is the group&lt;/EM&gt;&lt;/P&gt;&lt;PRE&gt;# chown -R centos:xxxxx  /home/centos/---/---/&lt;/PRE&gt;&lt;P&gt;&lt;EM&gt;Hope that helps&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 17:34:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185072#M77991</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2018-05-07T17:34:20Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185073#M77992</link>
      <description>&lt;P&gt;The output of "id":&lt;/P&gt;&lt;PRE&gt;uid=1000(centos) gid=1000(centos) groups=1000(centos),4(adm),10(wheel),190(systemd-journal)&lt;/PRE&gt;&lt;P&gt;I executed "chown -R centos:centos  /home/centos/test" but still get the same error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;18/05/07 12:06:28 ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user=centos, access=WRITE, inode="/home/centos/test/df.parquet/_temporary/0":hdfs:hdfs:drwxr-xr-x&lt;/PRE&gt;&lt;P&gt;This is the output of "ls -la" executed in "/home/centos":&lt;/P&gt;&lt;PRE&gt;total 36236
drwx------.  4 centos centos     4096 May  7 12:34 .
drwxr-xr-x. 15 root   root       4096 Apr 16 18:41 ..
-rw-------.  1 centos centos    13781 May  7 11:26 .bash_history
-rw-r--r--.  1 centos centos       18 Mar  5  2015 .bash_logout
-rw-r--r--.  1 centos centos      193 Mar  5  2015 .bash_profile
-rw-r--r--.  1 centos centos      231 Mar  5  2015 .bashrc
-rw-rw-r--   1 centos centos       47 May  7 11:38 .scala_history
drwx------.  2 centos centos       46 May  2 07:57 .ssh
drwxrwxr-x   4 centos centos      144 May  7 11:42 test&lt;BR /&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 07 May 2018 19:37:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185073#M77992</guid>
      <dc:creator>liana_napalkova</dc:creator>
      <dc:date>2018-05-07T19:37:49Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185074#M77993</link>
      <description>&lt;P&gt;Maybe the problem is that I run Spark program in Yarn cluster mode? It means that the driver can be running in any of the machines of the cluster. So, probably I should run "chown -R centos:centos ..." in each machine or do ".coalesce(1)"? &lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 19:43:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185074#M77993</guid>
      <dc:creator>liana_napalkova</dc:creator>
      <dc:date>2018-05-07T19:43:22Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185075#M77994</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/70685/liananapalkova.html" nodeid="70685"&gt;@Liana Napalkova&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;.save&lt;/STRONG&gt; action in spark writes the data to HDFS, but the permissions are changed in Local file system.&lt;/P&gt;&lt;P&gt;Please change the permissions to&lt;STRONG&gt; /home/centos&lt;/STRONG&gt; directory in HDFS&lt;/P&gt;&lt;P&gt;Login as&lt;STRONG&gt; HDFS&lt;/STRONG&gt; user &lt;/P&gt;&lt;P&gt;hdfs dfs -chown -R centos /home/centos/*&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 19:51:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185075#M77994</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-05-07T19:51:35Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185076#M77995</link>
      <description>&lt;P&gt;I think that this is the reason. If I login as HDFS user and run "hdfs dfs -chown -R centos /home/centos/test", then it says that this directory does not exist. I created this directory as HDFS user and then changed permissions to centos. Should I write a parquet file to the full path?:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;df.coalesce(1).write.format("parquet").save("hdfs://eureambarimaster1.local.eurecat.org:8020/user/hdfs/test")&lt;/PRE&gt;</description>
      <pubDate>Mon, 07 May 2018 20:05:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185076#M77995</guid>
      <dc:creator>liana_napalkova</dc:creator>
      <dc:date>2018-05-07T20:05:22Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185077#M77996</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/70685/liananapalkova.html" nodeid="70685"&gt;@Liana Napalkova&lt;BR /&gt;&lt;/A&gt;Use &lt;STRONG&gt;write.mode&lt;/STRONG&gt; to specify is it &lt;STRONG&gt;overwrite/append&lt;/STRONG&gt; so that spark will write the file to test directory&lt;BR /&gt;&lt;PRE&gt;df.coalesce(1).write.mode("overwrite").format("parquet").save("/user/hdfs/test")&lt;/PRE&gt;&lt;P&gt;if we won't mention any mode spark will fail with directory already exists error because you have already created the &lt;STRONG&gt;test&lt;/STRONG&gt; directory.&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 20:19:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185077#M77996</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-05-07T20:19:48Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185078#M77997</link>
      <description>&lt;P&gt;Yes, sure. Sorry, I was actually referring to "hdfs://eureambarimaster1.local.eurecat.org:8020/user/hdfs/test/df.parquet" &lt;/P&gt;&lt;P&gt;Let me test it.&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 20:30:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185078#M77997</guid>
      <dc:creator>liana_napalkova</dc:creator>
      <dc:date>2018-05-07T20:30:00Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot find a saved DataFrame on disk</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185079#M77998</link>
      <description>&lt;P&gt;I have just tested it. It worked fine! Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 07 May 2018 20:45:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Cannot-find-a-saved-DataFrame-on-disk/m-p/185079#M77998</guid>
      <dc:creator>liana_napalkova</dc:creator>
      <dc:date>2018-05-07T20:45:34Z</dc:date>
    </item>
  </channel>
</rss>

