Reply
Highlighted
Explorer
Posts: 24
Registered: ‎01-11-2018

HDFS find and search utilities

[ Edited ]

Hi!

I need to delete lots of files from /tmp directory in HDFS based on their modification time. I planned to use 'find' command but it appears it doesn't support -mtime option:

hdfs dfs -help find
-find <path> ... <expression> ... :
  Finds all files that match the specified expression and
  applies selected actions to them. If no <path> is specified
  then defaults to the current working directory. If no
  expression is specified then defaults to -print.
  
  The following primary expressions are recognised:
    -name pattern
    -iname pattern
      Evaluates as true if the basename of the file matches the
      pattern using standard file system globbing.
      If -iname is used then the match is case insensitive.
  
    -print
    -print0
      Always evaluates to true. Causes the current pathname to be
      written to standard output followed by a newline. If the -print0
      expression is used then an ASCII NULL character is appended rather
      than a newline.
  
  The following operators are recognised:
    expression -a expression
    expression -and expression
    expression expression
      Logical AND operator for joining two expressions. Returns
      true if both child expressions return true. Implied by the
      juxtaposition of two expressions and so does not need to be
      explicitly specified. The second expression will not be
      applied if the first fails.

We use CDH 5.11 and I'd expect find tool should support more options as described in https://issues.apache.org/jira/browse/HADOOP-8989 which was integrated in version CDH 5.5:

https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_new_in_cdh_55.html

Was this change rolled back in later versions?

 

Alternatively I was thinking about using HdfsFindTool, eg.

hadoop jar /opt/cloudera/parcels/CDH-5.11.2-1.cdh5.11.2.p0.4/jars/search-mr-1.0.0-cdh5.11.2-job.jar org.apache.solr.hadoop.HdfsFindTool -find /tmp -type f -mtime +15, but I couldn't find way to use -exec option here - it  always returns "Unknown command error", eg:

find: Unknown command: -ls

 

Do you know perhaps how to use this option properly?

 

Thanks a million.

Announcements