<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: What is the fastest way to create large number of empty directories in hdfs? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226310#M67081</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12466/pbarna.html" nodeid="12466"&gt;@pbarna&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;you may use pig or grunt shell or hive CLI and pass all the directories at one shot which does much quicker. &lt;/P&gt;</description>
    <pubDate>Tue, 22 Aug 2017 22:03:42 GMT</pubDate>
    <dc:creator>bkosaraju</dc:creator>
    <dc:date>2017-08-22T22:03:42Z</dc:date>
    <item>
      <title>What is the fastest way to create large number of empty directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226309#M67080</link>
      <description>&lt;P&gt;For testing purposes I want to create very large number, let's say 1 million empty directories in hdfs. &lt;/P&gt;&lt;P&gt;What I tried to do is use `hdfs dfs -mkdir`, to create 8K directories and repeat this in a for loop. &lt;/P&gt;&lt;PRE&gt;for i in {1..125}
do
   dirs=""
   for j in {1..8000}; do
     dirs="$dirs /user/d$i.$j"
   done
   echo "$dirs"
   hdfs dfs -mkdir $dirs
done
&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;Apparently it takes hours to create 1M folders this way. &lt;/P&gt;&lt;P&gt;My question is, what would be the fastest way to create 1M empty folders?&lt;/P&gt;</description>
      <pubDate>Tue, 22 Aug 2017 20:16:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226309#M67080</guid>
      <dc:creator>bpgergo</dc:creator>
      <dc:date>2017-08-22T20:16:37Z</dc:date>
    </item>
    <item>
      <title>Re: What is the fastest way to create large number of empty directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226310#M67081</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12466/pbarna.html" nodeid="12466"&gt;@pbarna&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;you may use pig or grunt shell or hive CLI and pass all the directories at one shot which does much quicker. &lt;/P&gt;</description>
      <pubDate>Tue, 22 Aug 2017 22:03:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226310#M67081</guid>
      <dc:creator>bkosaraju</dc:creator>
      <dc:date>2017-08-22T22:03:42Z</dc:date>
    </item>
    <item>
      <title>Re: What is the fastest way to create large number of empty directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226311#M67082</link>
      <description>&lt;P&gt;Thanks for your response, &lt;A rel="user" href="https://community.cloudera.com/users/15193/bkosaraju.html" nodeid="15193"&gt;@bkosaraju&lt;/A&gt;, can you give me an example of any of these options you mentioned?&lt;/P&gt;</description>
      <pubDate>Tue, 22 Aug 2017 23:00:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226311#M67082</guid>
      <dc:creator>bpgergo</dc:creator>
      <dc:date>2017-08-22T23:00:35Z</dc:date>
    </item>
    <item>
      <title>Re: What is the fastest way to create large number of empty directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226312#M67083</link>
      <description>&lt;P&gt;I have done simple test and able to complete in few seconds with your code&lt;/P&gt;&lt;P&gt;and its wise to split in multiple pass.&lt;/P&gt;&lt;PRE&gt;#!/bin/bash
tgetfl=/tmp/hvdir_$(date +%s)
for i in {1..125}
do
   dirs=""
   for j in {1..8000}; do
     dirs="$dirs /dirtst/d$i.$j"
   done
   #echo "$dirs"
   echo dfs -mkdir $dirs
done &amp;gt; $tgetfl
date
hive -f $tgetfl
date
&lt;BR /&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 23 Aug 2017 10:37:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226312#M67083</guid>
      <dc:creator>bkosaraju</dc:creator>
      <dc:date>2017-08-23T10:37:03Z</dc:date>
    </item>
    <item>
      <title>Re: What is the fastest way to create large number of empty directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226313#M67084</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/12466/pbarna.html" nodeid="12466"&gt;@pbarna&lt;/A&gt;&lt;P&gt;I think the Java API should be the fastest.&lt;/P&gt;&lt;PRE&gt;FileSystem fs = FileSystem.get(URI.create(hdfsUri), conf);

class DirectoryThread extends Thread {

  private int from;
  private int count;
  private static final String basePath = "/user/d";

  public DirectoryThread(int from, int count) {
    this.from = from;
    this.count = count;
  }

  @Override
  public void run() {
    for (int i = from; i &amp;lt; from + count; i++) {
      Path path = new Path(basePath + i);
      try {
        fs.mkdirs(path);
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  }
}

long startTime = System.currentTimeMillis();
int threadCount = 8;
Thread threads[] = new Thread[threadCount];
int total = 1000000;
int countPerThread = total / threadCount;
for (int j = 0; j &amp;lt; threadCount; j++) {
  Thread thread = new DirectoryThread(j * countPerThread, countPerThread);
  thread.start();
  threads[j] = thread;
}
for (Thread thread : threads) {
  thread.join();
}
long endTime = System.currentTimeMillis();

System.out.println("Total: " + (endTime - startTime) + " milliseconds");&lt;/PRE&gt;&lt;P&gt;Obviously, use as many threads as you can. But still, this takes 1-2 minutes, I wonder how &lt;A rel="user" href="https://community.cloudera.com/users/15193/bkosaraju.html" nodeid="15193"&gt;@bkosaraju&lt;/A&gt; could "complete in few seconds with your code"&lt;/P&gt;</description>
      <pubDate>Wed, 23 Aug 2017 16:05:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-is-the-fastest-way-to-create-large-number-of-empty/m-p/226313#M67084</guid>
      <dc:creator>gnovak</dc:creator>
      <dc:date>2017-08-23T16:05:22Z</dc:date>
    </item>
  </channel>
</rss>

