<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Deleting Directory in HDFS using Spark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133067#M34918</link>
    <description>&lt;P&gt;Can someone provide the code snippet to delete a directory in HDFS using Spark/Spark-Streaming?&lt;/P&gt;&lt;P&gt;I am using spark-streaming to process some incoming data which is leading to blank directories in HDFS as it works on micro-batching, so I want a clean up job that can delete the empty directories.&lt;/P&gt;&lt;P&gt;Please provide any other suggestions as well, the solution needs to be in Java.&lt;/P&gt;</description>
    <pubDate>Mon, 18 Jul 2016 14:45:41 GMT</pubDate>
    <dc:creator>gmarya</dc:creator>
    <dc:date>2016-07-18T14:45:41Z</dc:date>
    <item>
      <title>Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133067#M34918</link>
      <description>&lt;P&gt;Can someone provide the code snippet to delete a directory in HDFS using Spark/Spark-Streaming?&lt;/P&gt;&lt;P&gt;I am using spark-streaming to process some incoming data which is leading to blank directories in HDFS as it works on micro-batching, so I want a clean up job that can delete the empty directories.&lt;/P&gt;&lt;P&gt;Please provide any other suggestions as well, the solution needs to be in Java.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 14:45:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133067#M34918</guid>
      <dc:creator>gmarya</dc:creator>
      <dc:date>2016-07-18T14:45:41Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133068#M34919</link>
      <description>&lt;P&gt;If you are using the java code, using hadoop class can delete the hdfs path 
hdfs.delete(neworg.apache.hadoop.fs.Path(output),true)&lt;/P&gt;&lt;P&gt;In spark you may try below, haven't tried myself though.
&lt;A href="https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAHUQ+_ZwpDpfs1DaFW9zFFzJVW1PKTQ74kR2qbTqrBy7T96K9A@mail.gmail.com%3E" target="_blank"&gt;https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAHUQ+_ZwpDpfs1DaFW9zFFzJVW1PKTQ74kR2qbTqrBy7T96K9A@mail.gmail.com%3E&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 15:00:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133068#M34919</guid>
      <dc:creator>nyadav</dc:creator>
      <dc:date>2016-07-18T15:00:05Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133069#M34920</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/7520/nyadav.html" nodeid="7520"&gt;@nyadav&lt;/A&gt; I found that already, any suggestions on how to delete the directories that have no data in them and leave the ones behind with data? &lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 15:02:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133069#M34920</guid>
      <dc:creator>gmarya</dc:creator>
      <dc:date>2016-07-18T15:02:58Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133070#M34921</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10352/gmarya.html" nodeid="10352"&gt;@Gautam Marya&lt;/A&gt;&lt;/P&gt;&lt;P&gt;can you try this&lt;/P&gt;&lt;P&gt;val fs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://sandbox.hortonworks.com:8030"), sc.hadoopConfiguration) &lt;/P&gt;&lt;P&gt;fs.delete(new org.apache.hadoop.fs.Path("/tmp/xyz"),true) // isRecusrive= true&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 15:14:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133070#M34921</guid>
      <dc:creator>rajkumar_singh</dc:creator>
      <dc:date>2016-07-18T15:14:35Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133071#M34922</link>
      <description>&lt;P&gt;Does this delete the directories that have no data in them and leaves the directories with data in them?
The point is to only remove directories that have no data.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 15:25:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133071#M34922</guid>
      <dc:creator>gmarya</dc:creator>
      <dc:date>2016-07-18T15:25:31Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133072#M34923</link>
      <description>&lt;P&gt;Have you tried to avoid folders with empty files?&lt;/P&gt;&lt;P&gt;As an idea, instead of using&lt;/P&gt;&lt;PRE&gt;&amp;lt;DStream&amp;gt;
.saveAsTextFiles("/tmp/results/ts", "json");&lt;/PRE&gt;&lt;P&gt;(which creates folders with empty files if nothing gets streamed from the source), I tried&lt;/P&gt;&lt;PRE&gt;&amp;lt;DStream&amp;gt;
.foreachRDD(rdd =&amp;gt; {
  try {
    val f = rdd.first() // fails for empty RDDs
    rdd.saveAsTextFile(s"/tmp/results/ts-${System.currentTimeMillis}.json")
  } catch {
    case e:Exception =&amp;gt; println("empty rdd")
  }
})&lt;/PRE&gt;&lt;P&gt;It seems to work for me. No Folders with empty files any more.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 16:27:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133072#M34923</guid>
      <dc:creator>bwalter1</dc:creator>
      <dc:date>2016-07-18T16:27:54Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133073#M34924</link>
      <description>&lt;P&gt;sorry, it's scala code, but java should work similar&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 16:30:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133073#M34924</guid>
      <dc:creator>bwalter1</dc:creator>
      <dc:date>2016-07-18T16:30:12Z</dc:date>
    </item>
    <item>
      <title>Re: Deleting Directory in HDFS using Spark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133074#M34925</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/452/bwalter.html" nodeid="452"&gt;@Bernhard Walter&lt;/A&gt; Thanks man, it worked , wrote a similar thing java &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2016 14:41:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Deleting-Directory-in-HDFS-using-Spark/m-p/133074#M34925</guid>
      <dc:creator>gmarya</dc:creator>
      <dc:date>2016-07-19T14:41:57Z</dc:date>
    </item>
  </channel>
</rss>

