<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Using HDFS as local storage for yarn cluster driver in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/378854#M243702</link>
    <description>&lt;P&gt;Hello, im new in Hadoop and just want to know, can i using hdfs as local storage in my Spark driver?&lt;/P&gt;&lt;P&gt;For example: im sending throught Livy a task where&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;kind"&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;SPAN&gt;"pyspark"&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;and "code" which contains some operations, that in result should be create some new file.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;When i do it in&amp;nbsp;yarn cluster mode, i find that&amp;nbsp;&lt;SPAN&gt;new file was created in a local storage of node with path like: /tmp/hadoop-username/nm-local-dir/usercache/root/appcache......&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;Can i have any way for set path instead local in hdfs?&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;I want save my spark results(new created file) in hdfs&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;When i set&amp;nbsp;spark.local.dir or&amp;nbsp;yarn.nodemanager.local-dirs = hdfs:///temp Livy session just not starting&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Mounting HDFS dfs-fuse not seems like the best way.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Or i should use my own fileApp.jar that will be work on each node and each sessions?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Fri, 10 Nov 2023 08:41:15 GMT</pubDate>
    <dc:creator>one4like</dc:creator>
    <dc:date>2023-11-10T08:41:15Z</dc:date>
    <item>
      <title>Using HDFS as local storage for yarn cluster driver</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/378854#M243702</link>
      <description>&lt;P&gt;Hello, im new in Hadoop and just want to know, can i using hdfs as local storage in my Spark driver?&lt;/P&gt;&lt;P&gt;For example: im sending throught Livy a task where&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;kind"&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;SPAN&gt;"pyspark"&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;and "code" which contains some operations, that in result should be create some new file.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;When i do it in&amp;nbsp;yarn cluster mode, i find that&amp;nbsp;&lt;SPAN&gt;new file was created in a local storage of node with path like: /tmp/hadoop-username/nm-local-dir/usercache/root/appcache......&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;Can i have any way for set path instead local in hdfs?&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;I want save my spark results(new created file) in hdfs&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;When i set&amp;nbsp;spark.local.dir or&amp;nbsp;yarn.nodemanager.local-dirs = hdfs:///temp Livy session just not starting&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Mounting HDFS dfs-fuse not seems like the best way.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Or i should use my own fileApp.jar that will be work on each node and each sessions?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 10 Nov 2023 08:41:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/378854#M243702</guid>
      <dc:creator>one4like</dc:creator>
      <dc:date>2023-11-10T08:41:15Z</dc:date>
    </item>
    <item>
      <title>Re: Using HDFS as local storage for yarn cluster driver</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/379052#M243768</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/107841"&gt;@one4like&lt;/a&gt;,&amp;nbsp;Welcome to our community! To help you get the best possible answer, I have tagged our Spark experts &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78612"&gt;@RangaReddy&lt;/a&gt;&amp;nbsp; &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/81193"&gt;@Babasaheb&lt;/a&gt;&amp;nbsp;who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2023 09:40:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/379052#M243768</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2023-11-15T09:40:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using HDFS as local storage for yarn cluster driver</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/379082#M243774</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/107841"&gt;@one4like&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Pushing every local file of a job to HDFS will cause issues, especially in larger clusters. Local directories are used as scratch location. Spills of mappers are written there and moving that over to the network will have performance impacts. The local storage of the scratch files and shuffle files is done exactly to prevent this. It also has security impacts as the NM now pushes the keys for each application on to a network location which could be accessible for others.&lt;/P&gt;&lt;P&gt;A far better solution is to use the fact that the value of yarn.nodemanager.local-dirs can point to multiple mount points and thus spreading the load over all mount points.&lt;/P&gt;&lt;P&gt;So&amp;nbsp;the answer is NO. local-dirs must contain a list of local paths. There's an explicit check in code which only allows local FS to be used.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;See here: &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java#L224" target="_blank"&gt;https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java#L224&lt;/A&gt; Please note that an exception is thrown when a non local file system is referenced.&lt;/SPAN&gt;&lt;/P&gt;&lt;P class="p1"&gt;If you found this response assisted with your query, please take a moment to log in and click on&amp;nbsp;KUDOS &lt;SPAN class="s1"&gt;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/SPAN&gt;&amp;nbsp;&amp;amp; ”Accept as Solution" below this post.&lt;/P&gt;&lt;P class="p1"&gt;Thank you.&lt;/P&gt;&lt;P class="p1"&gt;Bjagtap&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2023 15:50:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-HDFS-as-local-storage-for-yarn-cluster-driver/m-p/379082#M243774</guid>
      <dc:creator>Babasaheb</dc:creator>
      <dc:date>2023-11-15T15:50:33Z</dc:date>
    </item>
  </channel>
</rss>

