Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Solr indexing HDFS documents

Highlighted

Solr indexing HDFS documents

Master Collaborator

Hi,

 I checked the Solr tutorial and managed to run a basic SolrCloud and created and index/core.

After posting som documents bin/post -c mycore ~/mylib/*.py I succesfully indexed all my python files.

 

But I didnt find nowhere a tutorial/guide how to index files stored inside the HDFS.

 

Can anybody point me in the right direction?

 

Thanks

T.

 

3 REPLIES 3

Re: Solr indexing HDFS documents

Master Guru
Have you taken a look at the MapReduceIndexer tool documentation/tutorial? It is part of the Cloudera Search tutorial at http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_tutorial.htm... specifically under http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_batch_index_...

Re: Solr indexing HDFS documents

Expert Contributor
what about using FUSE to mount your hdfs and apply same method you used to index local files , to index the data inside your HDFS

Re: Solr indexing HDFS documents

Super Collaborator

You can use the MapReduceIndexerTool, coupled with the hdfsFindTool:

 

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_hdfsfindtool...

 

The hdfsFindTool will generate a list of documents, based on the find parameters you've specified, and then those can be piped to the MapReduceIndexerTool with the '--input-list -' option (to specify the input-list is coming from std input)

Don't have an account?
Coming from Hortonworks? Activate your account here