Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

solr schema less in text search

avatar
Expert Contributor

i am using solr version 4.4 CDH 5.3.1 , and was wondering if its possible to insert a log file "unstrucuted" into solr and search for specific words in this text, is it possible as i don't have a schema for the file , its just a text file ? and if yes , how that's can be done using cloudera manager to configure solr to do so ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

i found this URL very helpful , so if anyone is facing this problem , it will help alot 

 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika 

 

but for solr cloud , there's another good way:

 

1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file

 

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>

 

2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml

 

below you can find more about DIH

 

https://wiki.apache.org/solr/DataImportHandler

(check file data source part)

 

3- reload the core 

 

4- from solr web UI you can start indexing the file/files you specified in the DIH ..

 

Happy indexing 

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

i found a command line which takes files in a directory and recursivly index them :

 

java -classpath /opt/cloudera/parcels/CDH/lib/solr/solr-core-4.4.0-cdh5.3.1.jar -Dauto=yes -Dc=testing -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool mydata/

 

 

but i got an error : 

 

SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update..
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory mydata (9 files, depth=0)
POSTing file Word_Count_input - Copy (4).txt (text/plain)
SimplePostTool: WARNING: Solr returned an error #404 Not Found

 

and it doesn't commit the changes as well so nothing is writtin in solr 

avatar
Expert Contributor

i found this URL very helpful , so if anyone is facing this problem , it will help alot 

 

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika 

 

but for solr cloud , there's another good way:

 

1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file

 

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>

 

2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml

 

below you can find more about DIH

 

https://wiki.apache.org/solr/DataImportHandler

(check file data source part)

 

3- reload the core 

 

4- from solr web UI you can start indexing the file/files you specified in the DIH ..

 

Happy indexing