Posts: 56
Registered: ‎02-09-2015
Accepted Solution

solr schema less in text search

i am using solr version 4.4 CDH 5.3.1 , and was wondering if its possible to insert a log file "unstrucuted" into solr and search for specific words in this text, is it possible as i don't have a schema for the file , its just a text file ? and if yes , how that's can be done using cloudera manager to configure solr to do so ?

Posts: 56
Registered: ‎02-09-2015

Re: solr schema less in text search

i found a command line which takes files in a directory and recursivly index them :


java -classpath /opt/cloudera/parcels/CDH/lib/solr/solr-core-4.4.0-cdh5.3.1.jar -Dauto=yes -Dc=testing -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool mydata/



but i got an error : 


SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update..
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory mydata (9 files, depth=0)
POSTing file Word_Count_input - Copy (4).txt (text/plain)
SimplePostTool: WARNING: Solr returned an error #404 Not Found


and it doesn't commit the changes as well so nothing is writtin in solr 

Posts: 56
Registered: ‎02-09-2015

Re: solr schema less in text search

i found this URL very helpful , so if anyone is facing this problem , it will help alot 


but for solr cloud , there's another good way:


1- configure data import handler in the solrconfig.xml 

add this part after any request handler inside the file


<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>


2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml


below you can find more about DIH

(check file data source part)


3- reload the core 


4- from solr web UI you can start indexing the file/files you specified in the DIH ..


Happy indexing