Support Questions

Find answers, ask questions, and share your expertise

org.apache.solr.hadoop.HdfsFindTool not available in Cloudera 6.1

avatar
New Contributor

After upgrading Cloudera to 6.1 

org.apache.solr.hadoop.HdfsFindTool

seems to be no longer available in the 

search-mr-job.jar

 

hadoop jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool
WARNING: Use "yarn jar" to launch YARN applications.
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.solr.hadoop.HdfsFindTool
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:306)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:227)

 

are there any alternatives? Is there a way to run older version together with the 6.1 and newer stack? is this class contained in any other libraries?

 

4 REPLIES 4

avatar
Mentor
The search-based HDFS find tool has been removed and is superseded in C6 by the native "hdfs dfs -find" command, documented here: https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find

avatar
New Contributor

The problem is, find doesn't seem to take the parameters that old tool, or OS level find accepts - and I cannot find documentation of viable expressions and its syntax:

 

hdfs dfs -find /example/path -mtime +14
find: Unexpected argument: -mtime

avatar
New Contributor

So - if I am correct, native hdfs find accepts only two expressions:

 

The following primary expressions are recognised:

-name pattern
-iname pattern

Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the match is case insensitive.

-print
-print0

Always evaluates to true. Causes the current pathname to be written to standard output. If the -print0 expression is used then an ASCII NULL character is appended.

Which makes it useless for searching files-older-then - which was main use case for the HdfsFindTool. Is there any chance of the HdfsFindTool being brought back? Or any workaround for how to make it work with newer cloudera?

avatar
New Contributor

I have the same issue as @lmdrone . Hadoop 'find' command only supports two expressions and cloudera has removed org.apache.solr.hadoop.HdfsFindTool utils. How do we filter files based on modified time? Please bring back "org.apache.solr.hadoop.HdfsFindTool"