Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎03-13-2019

org.apache.solr.hadoop.HdfsFindTool not available in Cloudera 6.1

[ Edited ]

After upgrading Cloudera to 6.1 

org.apache.solr.hadoop.HdfsFindTool

seems to be no longer available in the 

search-mr-job.jar

 

hadoop jar /opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool
WARNING: Use "yarn jar" to launch YARN applications.
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.solr.hadoop.HdfsFindTool
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:306)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:227)

 

are there any alternatives? Is there a way to run older version together with the 6.1 and newer stack? is this class contained in any other libraries?

 

Posts: 1,903
Kudos: 435
Solutions: 305
Registered: ‎07-31-2013

Re: org.apache.solr.hadoop.HdfsFindTool not available in Cloudera 6.1

The search-based HDFS find tool has been removed and is superseded in C6 by the native "hdfs dfs -find" command, documented here: https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find
New Contributor
Posts: 3
Registered: ‎03-13-2019

Re: org.apache.solr.hadoop.HdfsFindTool not available in Cloudera 6.1

The problem is, find doesn't seem to take the parameters that old tool, or OS level find accepts - and I cannot find documentation of viable expressions and its syntax:

 

hdfs dfs -find /example/path -mtime +14
find: Unexpected argument: -mtime
New Contributor
Posts: 3
Registered: ‎03-13-2019

Re: org.apache.solr.hadoop.HdfsFindTool not available in Cloudera 6.1

So - if I am correct, native hdfs find accepts only two expressions:

 

The following primary expressions are recognised:

-name pattern
-iname pattern

Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the match is case insensitive.

-print
-print0

Always evaluates to true. Causes the current pathname to be written to standard output. If the -print0 expression is used then an ASCII NULL character is appended.

Which makes it useless for searching files-older-then - which was main use case for the HdfsFindTool. Is there any chance of the HdfsFindTool being brought back? Or any workaround for how to make it work with newer cloudera?