We have 150 gb of structured data in oracle . We are transfering the data to hdfs and doing solr index to that data . using the solr api we are proving interfaces to front end application for quering the records .This process is very lengthy and time consuming .if any records changes we have to do a index again.
We are looking for laternative using spark .
Any one have any suggestions on this .Our goal is make a fast query results .
Updating records shouln't be that lengthy. Iif you use the rest API to update records you should be fine for small batches. For larger updates using morphlines with the batch indexer works like a charm. If you make sure you insert/update the new records using the same unique "id" field, they get overwritten with the new data.
Update via REST:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary @books.json -H 'Content-type:application/json'