Reply
Master
Posts: 430
Registered: ‎07-01-2015

Solr indexing HDFS documents

Hi,

 I checked the Solr tutorial and managed to run a basic SolrCloud and created and index/core.

After posting som documents bin/post -c mycore ~/mylib/*.py I succesfully indexed all my python files.

 

But I didnt find nowhere a tutorial/guide how to index files stored inside the HDFS.

 

Can anybody point me in the right direction?

 

Thanks

T.

 

Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: Solr indexing HDFS documents

Have you taken a look at the MapReduceIndexer tool documentation/tutorial? It is part of the Cloudera Search tutorial at http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_tutorial.htm... specifically under http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_batch_index_...
Contributor
Posts: 56
Registered: ‎02-09-2015

Re: Solr indexing HDFS documents

what about using FUSE to mount your hdfs and apply same method you used to index local files , to index the data inside your HDFS
Cloudera Employee
Posts: 275
Registered: ‎01-09-2014

Re: Solr indexing HDFS documents

You can use the MapReduceIndexerTool, coupled with the hdfsFindTool:

 

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_hdfsfindtool...

 

The hdfsFindTool will generate a list of documents, based on the find parameters you've specified, and then those can be piped to the MapReduceIndexerTool with the '--input-list -' option (to specify the input-list is coming from std input)