Created on 03-20-2019 03:55 AM - edited 09-16-2022 07:14 AM
After upgrading Cloudera to 6.1
org.apache.solr.hadoop.HdfsFindTool
seems to be no longer available in the
search-mr-job.jar
hadoop jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool WARNING: Use "yarn jar" to launch YARN applications. Exception in thread "main" java.lang.ClassNotFoundException: org.apache.solr.hadoop.HdfsFindTool at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.util.RunJar.run(RunJar.java:306) at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
are there any alternatives? Is there a way to run older version together with the 6.1 and newer stack? is this class contained in any other libraries?
Created 03-21-2019 05:48 PM
Created 03-22-2019 02:35 AM
The problem is, find doesn't seem to take the parameters that old tool, or OS level find accepts - and I cannot find documentation of viable expressions and its syntax:
hdfs dfs -find /example/path -mtime +14 find: Unexpected argument: -mtime
Created 03-25-2019 04:51 AM
So - if I am correct, native hdfs find accepts only two expressions:
The following primary expressions are recognised: -name pattern -iname pattern Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the match is case insensitive. -print -print0 Always evaluates to true. Causes the current pathname to be written to standard output. If the -print0 expression is used then an ASCII NULL character is appended.
Which makes it useless for searching files-older-then - which was main use case for the HdfsFindTool. Is there any chance of the HdfsFindTool being brought back? Or any workaround for how to make it work with newer cloudera?
Created 02-19-2021 03:02 AM
I have the same issue as @lmdrone . Hadoop 'find' command only supports two expressions and cloudera has removed org.apache.solr.hadoop.HdfsFindTool utils. How do we filter files based on modified time? Please bring back "org.apache.solr.hadoop.HdfsFindTool"