I need to delete lots of files from /tmp directory in HDFS based on their modification time. I planned to use 'find' command but it appears it doesn't support -mtime option:
hdfs dfs -help find
-find <path> ... <expression> ... :
Finds all files that match the specified expression and
applies selected actions to them. If no <path> is specified
then defaults to the current working directory. If no
expression is specified then defaults to -print.
The following primary expressions are recognised:
Evaluates as true if the basename of the file matches the
pattern using standard file system globbing.
If -iname is used then the match is case insensitive.
Always evaluates to true. Causes the current pathname to be
written to standard output followed by a newline. If the -print0
expression is used then an ASCII NULL character is appended rather
than a newline.
The following operators are recognised:
expression -a expression
expression -and expression
Logical AND operator for joining two expressions. Returns
true if both child expressions return true. Implied by the
juxtaposition of two expressions and so does not need to be
explicitly specified. The second expression will not be
applied if the first fails.
Alternatively I was thinking about using HdfsFindTool, eg.
hadoop jar /opt/cloudera/parcels/CDH-5.11.2-1.cdh5.11.2.p0.4/jars/search-mr-1.0.0-cdh5.11.2-job.jar org.apache.solr.hadoop.HdfsFindTool -find /tmp -type f -mtime +15, but I couldn't find way to use -exec option here - it always returns "Unknown command error", eg:
find: Unknown command: -ls
Do you know perhaps how to use this option properly?