Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

solr schema less in text search

Solved Go to solution

solr schema less in text search

Expert Contributor

i am using solr version 4.4 CDH 5.3.1 , and was wondering if its possible to insert a log file "unstrucuted" into solr and search for specific words in this text, is it possible as i don't have a schema for the file , its just a text file ? and if yes , how that's can be done using cloudera manager to configure solr to do so ?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: solr schema less in text search

Expert Contributor

i found this URL very helpful , so if anyone is facing this problem , it will help alot 

 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika 

 

but for solr cloud , there's another good way:

 

1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file

 

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>

 

2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml

 

below you can find more about DIH

 

https://wiki.apache.org/solr/DataImportHandler

(check file data source part)

 

3- reload the core 

 

4- from solr web UI you can start indexing the file/files you specified in the DIH ..

 

Happy indexing 

2 REPLIES 2

Re: solr schema less in text search

Expert Contributor

i found a command line which takes files in a directory and recursivly index them :

 

java -classpath /opt/cloudera/parcels/CDH/lib/solr/solr-core-4.4.0-cdh5.3.1.jar -Dauto=yes -Dc=testing -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool mydata/

 

 

but i got an error : 

 

SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update..
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory mydata (9 files, depth=0)
POSTing file Word_Count_input - Copy (4).txt (text/plain)
SimplePostTool: WARNING: Solr returned an error #404 Not Found

 

and it doesn't commit the changes as well so nothing is writtin in solr 

Highlighted

Re: solr schema less in text search

Expert Contributor

i found this URL very helpful , so if anyone is facing this problem , it will help alot 

 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika 

 

but for solr cloud , there's another good way:

 

1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file

 

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>

 

2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml

 

below you can find more about DIH

 

https://wiki.apache.org/solr/DataImportHandler

(check file data source part)

 

3- reload the core 

 

4- from solr web UI you can start indexing the file/files you specified in the DIH ..

 

Happy indexing