Support Questions
Find answers, ask questions, and share your expertise

Solr indexing HDFS documents

Solr indexing HDFS documents

Master Collaborator


 I checked the Solr tutorial and managed to run a basic SolrCloud and created and index/core.

After posting som documents bin/post -c mycore ~/mylib/*.py I succesfully indexed all my python files.


But I didnt find nowhere a tutorial/guide how to index files stored inside the HDFS.


Can anybody point me in the right direction?






Re: Solr indexing HDFS documents

Master Guru
Have you taken a look at the MapReduceIndexer tool documentation/tutorial? It is part of the Cloudera Search tutorial at specifically under

Re: Solr indexing HDFS documents

Expert Contributor
what about using FUSE to mount your hdfs and apply same method you used to index local files , to index the data inside your HDFS

Re: Solr indexing HDFS documents

Super Collaborator

You can use the MapReduceIndexerTool, coupled with the hdfsFindTool:


The hdfsFindTool will generate a list of documents, based on the find parameters you've specified, and then those can be piped to the MapReduceIndexerTool with the '--input-list -' option (to specify the input-list is coming from std input)