11-14-2018 04:36 AM - edited 11-14-2018 04:36 AM
I need to delete lots of files from /tmp directory in HDFS based on their modification time. I planned to use 'find' command but it appears it doesn't support -mtime option:
hdfs dfs -help find -find <path> ... <expression> ... : Finds all files that match the specified expression and applies selected actions to them. If no <path> is specified then defaults to the current working directory. If no expression is specified then defaults to -print. The following primary expressions are recognised: -name pattern -iname pattern Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the match is case insensitive. -print -print0 Always evaluates to true. Causes the current pathname to be written to standard output followed by a newline. If the -print0 expression is used then an ASCII NULL character is appended rather than a newline. The following operators are recognised: expression -a expression expression -and expression expression expression Logical AND operator for joining two expressions. Returns true if both child expressions return true. Implied by the juxtaposition of two expressions and so does not need to be explicitly specified. The second expression will not be applied if the first fails.
We use CDH 5.11 and I'd expect find tool should support more options as described in https://issues.apache.org/jira/browse/HADOOP-8989 which was integrated in version CDH 5.5:
Was this change rolled back in later versions?
Alternatively I was thinking about using HdfsFindTool, eg.
hadoop jar /opt/cloudera/parcels/CDH-5.11.2-1.cdh5.11.2.p0.4/jars/search-mr-1.0.0-cdh5.11.2-job.jar org.apache.solr.hadoop.HdfsFindTool -find /tmp -type f -mtime +15, but I couldn't find way to use -exec option here - it always returns "Unknown command error", eg:
find: Unknown command: -ls
Do you know perhaps how to use this option properly?
Thanks a million.