<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How do you move files but not the directories in hdfs? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123918#M17713</link>
    <description>&lt;P&gt;Starting in HDP 2.3, the Hadoop shell ships with a find command.  Full details are available in the &lt;A href="http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find"&gt;FileSystemShell find documentation&lt;/A&gt; in Apache.&lt;/P&gt;&lt;P&gt;However, unlike the standard Unix command, the Hadoop version does not yet implement the "maxdepth" or "type" options shown in your example.  There are several uncommitted patches still in progress to add these features.  &lt;A href="https://issues.apache.org/jira/browse/HADOOP-10578"&gt;HADOOP-10578&lt;/A&gt; implements "maxdepth".  &lt;A href="https://issues.apache.org/jira/browse/HADOOP-10579"&gt;HADOOP-10579&lt;/A&gt; implements "type".  These features are not yet available in any release of either HDP or Apache Hadoop.&lt;/P&gt;&lt;P&gt;Until these features become generally available, I think your only other option is to use wildcard glob matching as suggested in prior answers.  I understand you said that there is some variability to the names because of dates and times embedded into the names.  You would need to find a way to stage these files in a predictable way, so that you can effectively use a wildcard to match only the files that you want to match.  This might require renaming files or moving them into a different directory structure at time of ingest.&lt;/P&gt;&lt;P&gt;Another possible option could be to script it externally, such as by using bash to run an ls -R command, parse the results, and then call the Hadoop shell again using only the files that you want.  However, this would introduce overhead from needing to start a separate Hadoop shell process (a JVM) for each command, which might be unacceptable.&lt;/P&gt;</description>
    <pubDate>Thu, 04 Feb 2016 01:45:47 GMT</pubDate>
    <dc:creator>cnauroth</dc:creator>
    <dc:date>2016-02-04T01:45:47Z</dc:date>
    <item>
      <title>How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123914#M17709</link>
      <description>&lt;P&gt;I've been trying to find the solution to this problem for a while. I 
have found that in a normal file system using shell you can use this 
command to move all files under a location but leave the directories 
alone.&lt;/P&gt;&lt;PRE&gt;    find .-maxdepth 1-type f -exec mv {} destination_path \;&lt;/PRE&gt;&lt;P&gt;I was wondering if there is also a command to be able to do the same in hdfs.&lt;/P&gt;&lt;P&gt;So if I have a folder in hdfs called "folder1" which contains the 
files "copyThis.txt", "copyThisAsWell.txt" and "theFinalCopy.txt" and 
also contains a folder "doNotCopy" and I want to copy the files into a 
new folder called "folder2" but leave the folder "doNotCopy" behind, how
 can this be done in hdfs?&lt;/P&gt;&lt;P&gt;Thanks for any help you can provide.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 01:07:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123914#M17709</guid>
      <dc:creator>daniel_perry</dc:creator>
      <dc:date>2016-02-04T01:07:31Z</dc:date>
    </item>
    <item>
      <title>Re: How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123915#M17710</link>
      <description>&lt;P&gt;hdfs dfs -cp folder1/filename filedestination/&lt;/P&gt;&lt;P&gt;specify each file or use wildcard * to copy files. &lt;A rel="user" href="https://community.cloudera.com/users/2430/danielperry.html" nodeid="2430"&gt;@Daniel Perry&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 01:07:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123915#M17710</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-04T01:07:33Z</dc:date>
    </item>
    <item>
      <title>Re: How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123916#M17711</link>
      <description>&lt;P&gt;My issue is that the files that will be copied across will have the date and time in the filename, they will also be updated daily and so it will be next to impossible to know what the names of the files to be copied will be.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 01:20:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123916#M17711</guid>
      <dc:creator>daniel_perry</dc:creator>
      <dc:date>2016-02-04T01:20:24Z</dc:date>
    </item>
    <item>
      <title>Re: How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123917#M17712</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2430/danielperry.html" nodeid="2430"&gt;@Daniel Perry&lt;/A&gt; I recommend to look at Apache Nifi, it has strong features in that regards. You can track what is processed, though maybe not file based but per event, which happens to be any line within a file.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 01:21:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123917#M17712</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-04T01:21:35Z</dc:date>
    </item>
    <item>
      <title>Re: How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123918#M17713</link>
      <description>&lt;P&gt;Starting in HDP 2.3, the Hadoop shell ships with a find command.  Full details are available in the &lt;A href="http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find"&gt;FileSystemShell find documentation&lt;/A&gt; in Apache.&lt;/P&gt;&lt;P&gt;However, unlike the standard Unix command, the Hadoop version does not yet implement the "maxdepth" or "type" options shown in your example.  There are several uncommitted patches still in progress to add these features.  &lt;A href="https://issues.apache.org/jira/browse/HADOOP-10578"&gt;HADOOP-10578&lt;/A&gt; implements "maxdepth".  &lt;A href="https://issues.apache.org/jira/browse/HADOOP-10579"&gt;HADOOP-10579&lt;/A&gt; implements "type".  These features are not yet available in any release of either HDP or Apache Hadoop.&lt;/P&gt;&lt;P&gt;Until these features become generally available, I think your only other option is to use wildcard glob matching as suggested in prior answers.  I understand you said that there is some variability to the names because of dates and times embedded into the names.  You would need to find a way to stage these files in a predictable way, so that you can effectively use a wildcard to match only the files that you want to match.  This might require renaming files or moving them into a different directory structure at time of ingest.&lt;/P&gt;&lt;P&gt;Another possible option could be to script it externally, such as by using bash to run an ls -R command, parse the results, and then call the Hadoop shell again using only the files that you want.  However, this would introduce overhead from needing to start a separate Hadoop shell process (a JVM) for each command, which might be unacceptable.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 01:45:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123918#M17713</guid>
      <dc:creator>cnauroth</dc:creator>
      <dc:date>2016-02-04T01:45:47Z</dc:date>
    </item>
    <item>
      <title>Re: How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123919#M17714</link>
      <description>&lt;P&gt;I will look into it, thanks for your help&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 18:10:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123919#M17714</guid>
      <dc:creator>daniel_perry</dc:creator>
      <dc:date>2016-02-04T18:10:12Z</dc:date>
    </item>
    <item>
      <title>Re: How do you move files but not the directories in hdfs?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123920#M17715</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/381/cnauroth.html" nodeid="381"&gt;@Chris Nauroth&lt;/A&gt;  This is good information. Thanks for sharing this. &lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 18:55:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-move-files-but-not-the-directories-in-hdfs/m-p/123920#M17715</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-02-04T18:55:37Z</dc:date>
    </item>
  </channel>
</rss>

