Created on 03-23-2015 02:07 AM - edited 09-16-2022 02:25 AM
i am using solr version 4.4 CDH 5.3.1 , and was wondering if its possible to insert a log file "unstrucuted" into solr and search for specific words in this text, is it possible as i don't have a schema for the file , its just a text file ? and if yes , how that's can be done using cloudera manager to configure solr to do so ?
Created 12-31-2015 04:48 AM
i found this URL very helpful , so if anyone is facing this problem , it will help alot
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
but for solr cloud , there's another good way:
1- configure data import handler in the solrconfig.xml
add this part after any request handler inside the file
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>
2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml
below you can find more about DIH
https://wiki.apache.org/solr/DataImportHandler
(check file data source part)
3- reload the core
4- from solr web UI you can start indexing the file/files you specified in the DIH ..
Happy indexing
Created 03-23-2015 04:22 AM
i found a command line which takes files in a directory and recursivly index them :
java -classpath /opt/cloudera/parcels/CDH/lib/solr/solr-core-4.4.0-cdh5.3.1.jar -Dauto=yes -Dc=testing -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool mydata/
but i got an error :
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update..
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory mydata (9 files, depth=0)
POSTing file Word_Count_input - Copy (4).txt (text/plain)
SimplePostTool: WARNING: Solr returned an error #404 Not Found
and it doesn't commit the changes as well so nothing is writtin in solr
Created 12-31-2015 04:48 AM
i found this URL very helpful , so if anyone is facing this problem , it will help alot
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
but for solr cloud , there's another good way:
1- configure data import handler in the solrconfig.xml
add this part after any request handler inside the file
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">DIHconfigfile.xml</str>
</lst>
</requestHandler>
2- create the data import handler file "DIHconfigfile.xml" and reside it next to the solrconfig.xml
below you can find more about DIH
https://wiki.apache.org/solr/DataImportHandler
(check file data source part)
3- reload the core
4- from solr web UI you can start indexing the file/files you specified in the DIH ..
Happy indexing