Support Questions

Find answers, ask questions, and share your expertise

solr schema less in text search

avatar
Expert Contributor

i am using solr version 4.4 CDH 5.3.1 , and was wondering if its possible to insert a log file "unstrucuted" into solr and search for specific words in this text, is it possible as i don't have a schema for the file , its just a text file ? and if yes , how that's can be done using cloudera manager to configure solr to do so ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

i found this URL very helpful , so if anyone is facing this problem , it will help alot 

 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika 

 

but for solr cloud , there's another good way:

 

1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file

 

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>

 

2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml

 

below you can find more about DIH

 

https://wiki.apache.org/solr/DataImportHandler

(check file data source part)

 

3- reload the core 

 

4- from solr web UI you can start indexing the file/files you specified in the DIH ..

 

Happy indexing 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

i found a command line which takes files in a directory and recursivly index them :

 

java -classpath /opt/cloudera/parcels/CDH/lib/solr/solr-core-4.4.0-cdh5.3.1.jar -Dauto=yes -Dc=testing -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool mydata/

 

 

but i got an error : 

 

SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update..
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory mydata (9 files, depth=0)
POSTing file Word_Count_input - Copy (4).txt (text/plain)
SimplePostTool: WARNING: Solr returned an error #404 Not Found

 

and it doesn't commit the changes as well so nothing is writtin in solr 

avatar
Expert Contributor

i found this URL very helpful , so if anyone is facing this problem , it will help alot 

 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika 

 

but for solr cloud , there's another good way:

 

1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file

 

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>

 

2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml

 

below you can find more about DIH

 

https://wiki.apache.org/solr/DataImportHandler

(check file data source part)

 

3- reload the core 

 

4- from solr web UI you can start indexing the file/files you specified in the DIH ..

 

Happy indexing