<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question All Hdfs file names older than N days in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328141#M230203</link>
    <description>&lt;P&gt;Dears,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Greetings!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I have the requirement to get the all HDFS filename which are older than N days.&lt;/P&gt;&lt;P&gt;I got the all last level directory which are older But requirement is all filenames.&lt;/P&gt;&lt;P&gt;Kindly support with hdfs command or script or code for the same.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;also suggest on if we query any Hive table and done some sum on one column or join with other table Is this will change the timestamp of underlining hdfs file of respective Hive table. obviously write or update to table will change the timestamp of respective hdfs file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate swift response&lt;/P&gt;</description>
    <pubDate>Tue, 19 Oct 2021 18:28:44 GMT</pubDate>
    <dc:creator>DA-Ka</dc:creator>
    <dc:date>2021-10-19T18:28:44Z</dc:date>
    <item>
      <title>All Hdfs file names older than N days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328141#M230203</link>
      <description>&lt;P&gt;Dears,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Greetings!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I have the requirement to get the all HDFS filename which are older than N days.&lt;/P&gt;&lt;P&gt;I got the all last level directory which are older But requirement is all filenames.&lt;/P&gt;&lt;P&gt;Kindly support with hdfs command or script or code for the same.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;also suggest on if we query any Hive table and done some sum on one column or join with other table Is this will change the timestamp of underlining hdfs file of respective Hive table. obviously write or update to table will change the timestamp of respective hdfs file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate swift response&lt;/P&gt;</description>
      <pubDate>Tue, 19 Oct 2021 18:28:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328141#M230203</guid>
      <dc:creator>DA-Ka</dc:creator>
      <dc:date>2021-10-19T18:28:44Z</dc:date>
    </item>
    <item>
      <title>Re: All Hdfs file names older than N days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328209#M230217</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92462"&gt;@DA-Ka&lt;/a&gt;，&lt;/P&gt;&lt;P&gt;Below example is inspired by this &lt;A href="https://stackoverflow.com/questions/44235019/delete-files-older-than-10days-on-hdfs" target="_self"&gt;link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;1)&amp;nbsp; use -t -R to list files recursively with timestamp:&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000FF"&gt;# sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;drwxrwx---+ - hive hive 0 2021-10-20 &lt;STRONG&gt;06:14&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/sample_07/.hive-staging_hive_2021-10-20_06-13-50_654_7549698524549477159-1&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;drwxrwx---+ - hive hive 0 2021-10-20 &lt;STRONG&gt;06:13&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;-rw-rw----+ 3 hive hive 48464 2021-10-20 &lt;STRONG&gt;06:13&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000/000000_0&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;FONT color="#000000"&gt;2) filter the files older than a timestamp:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000FF"&gt;sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 |awk -v dateA="$date" '{if (($6" "$7) &amp;lt;= "2021-10-20 06:13") {print ($6" "$7" "$8)}}'&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000FF"&gt;# sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 |awk -v dateA="$date" '{if (($6" "$7) &amp;lt;= "2021-10-20 06:13") {print ($6" "$7" "$8)}}'&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;2021-10-20 &lt;STRONG&gt;06:13&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#0000FF"&gt;2021-10-20 &lt;STRONG&gt;06:13&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000/000000_0&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regarding your last question, if sum or join could change the timestamp, I'm not sure, please try and then use above commands to see the timestamps.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Regards,&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Will&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;If the answer helps, please accept as solution and click thumbs up.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 08:01:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328209#M230217</guid>
      <dc:creator>willx</dc:creator>
      <dc:date>2021-10-20T08:01:55Z</dc:date>
    </item>
    <item>
      <title>Re: All Hdfs file names older than N days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328268#M230232</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92462"&gt;@DA-Ka&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;SUM and JOIN won't change the timestamp of the underlying file.&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;create table mytable (i int,j int,k int);&lt;BR /&gt;insert into mytable values (1,2,3),(4,5,6),(7,8,9);&lt;BR /&gt;create table mytable2 (i int,j int,k int);&lt;BR /&gt;insert into mytable2 values (1,2,6),(3,5,7),(4,8,9);&lt;/P&gt;&lt;P&gt;select * from mytable;&lt;BR /&gt;+------------+------------+------------+&lt;BR /&gt;| mytable.i | mytable.j | mytable.k |&lt;BR /&gt;+------------+------------+------------+&lt;BR /&gt;| 1 | 2 | 3 |&lt;BR /&gt;| 4 | 5 | 6 |&lt;BR /&gt;| 7 | 8 | 9 |&lt;BR /&gt;+------------+------------+------------+&lt;/P&gt;&lt;P&gt;select * from mytable2;&lt;BR /&gt;+-------------+-------------+-------------+&lt;BR /&gt;| mytable2.i | mytable2.j | mytable2.k |&lt;BR /&gt;+-------------+-------------+-------------+&lt;BR /&gt;| 1 | 2 | 6 |&lt;BR /&gt;| 3 | 5 | 7 |&lt;BR /&gt;| 4 | 8 | 9 |&lt;BR /&gt;+-------------+-------------+-------------+&lt;/P&gt;&lt;P&gt;# sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable&lt;BR /&gt;drwxrwx---+ - hive hive 0 &lt;STRONG&gt;2021-10-20 15:11&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000&lt;BR /&gt;-rw-rw----+ 3 hive hive 743 &lt;STRONG&gt;2021-10-20 15:12&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0&lt;/P&gt;&lt;P&gt;# sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable2&lt;BR /&gt;drwxrwx---+ - hive hive 0 &lt;STRONG&gt;2021-10-20 15:23&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000&lt;BR /&gt;-rw-rw----+ 3 hive hive 742 &lt;STRONG&gt;2021-10-20 15:23&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000/bucket_00000_0&lt;/P&gt;&lt;P&gt;1. Sum, timestamp is unchanged&lt;/P&gt;&lt;P&gt;select pos+1 as col,sum (val) as sum_col&lt;BR /&gt;from mytable t lateral view posexplode(array(*)) pe&lt;BR /&gt;group by pos;&lt;/P&gt;&lt;P&gt;+------+----------+&lt;BR /&gt;| col | sum_col |&lt;BR /&gt;+------+----------+&lt;BR /&gt;| 2 | 15 |&lt;BR /&gt;| 1 | 12 |&lt;BR /&gt;| 3 | 18 |&lt;BR /&gt;+------+----------+&lt;/P&gt;&lt;P&gt;# sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable&lt;BR /&gt;drwxrwx---+ - hive hive 0 &lt;STRONG&gt;2021-10-20 15:11&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000&lt;BR /&gt;-rw-rw----+ 3 hive hive 743 &lt;STRONG&gt;2021-10-20 15:12&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0&lt;/P&gt;&lt;P&gt;2. Inner Join, timestamp is unchanged&lt;/P&gt;&lt;P&gt;select * from&lt;BR /&gt;(select * from mytable)T1&lt;BR /&gt;join&lt;BR /&gt;(select * from mytable2)T2&lt;BR /&gt;on T1.i=T2.i&lt;/P&gt;&lt;P&gt;+-------+-------+-------+-------+-------+-------+&lt;BR /&gt;| t1.i | t1.j | t1.k | t2.i | t2.j | t2.k |&lt;BR /&gt;+-------+-------+-------+-------+-------+-------+&lt;BR /&gt;| 1 | 2 | 3 | 1 | 2 | 6 |&lt;BR /&gt;| 4 | 5 | 6 | 4 | 8 | 9 |&lt;BR /&gt;+-------+-------+-------+-------+-------+-------+&lt;/P&gt;&lt;P&gt;sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable&lt;BR /&gt;drwxrwx---+ - hive hive 0 &lt;STRONG&gt;2021-10-20 15:11&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000&lt;BR /&gt;-rw-rw----+ 3 hive hive 743 &lt;STRONG&gt;2021-10-20 15:12&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0&lt;BR /&gt;sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable2&lt;BR /&gt;drwxrwx---+ - hive hive 0 &lt;STRONG&gt;2021-10-20 15:23&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000&lt;BR /&gt;-rw-rw----+ 3 hive hive 742 &lt;STRONG&gt;2021-10-20 15:23&lt;/STRONG&gt; /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000/bucket_00000_0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Regards,&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Will&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 15:44:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328268#M230232</guid>
      <dc:creator>willx</dc:creator>
      <dc:date>2021-10-20T15:44:09Z</dc:date>
    </item>
    <item>
      <title>Re: All Hdfs file names older than N days</title>
      <link>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328396#M230255</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92462"&gt;@DA-Ka&lt;/a&gt;&amp;nbsp;You need to use HDFS Find tool&amp;nbsp; "org.apache.solr.hadoop.HdfsFindTool" for that purpose.&lt;/P&gt;&lt;P&gt;Refer below links which suggests some method to fid the old Files.&lt;/P&gt;&lt;P&gt;- &lt;A href="http://35.204.180.114/static/help/topics/search_hdfsfindtool.html" target="_blank"&gt;http://35.204.180.114/static/help/topics/search_hdfsfindtool.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, t&lt;SPAN&gt;he search-based HDFS find tool has been removed and is superseded in CDH 6 by the native "hdfs dfs -find" command, documented here:&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find" target="_blank" rel="nofollow noopener noreferrer"&gt;https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Oct 2021 07:49:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/All-Hdfs-file-names-older-than-N-days/m-p/328396#M230255</guid>
      <dc:creator>PabitraDas</dc:creator>
      <dc:date>2021-10-21T07:49:17Z</dc:date>
    </item>
  </channel>
</rss>

