Created 02-03-2017 05:18 AM
Hi All,
I understand SOLR creates a index file and makes searches faster - however I have a fundamental question -
Does SOLR stores the data + index - for example if I have a Table with 100 columns, and I want index on a few columns
Will SOLR store all the Table data so that it can show the full row on search match
OR
The full file can be in HDFS/HBASE
and SOLR can look it up and show the full row?
So can there be an approach where the Data is in HDFS and the primary/secondary indexes in SOLR - and search can find the full data in HDFS. Not only find , can also update / delete.
Thanks,
Avijeet
Created 02-03-2017 05:29 PM
I would recommend reading the following link:
http://www.solrtutorial.com/basic-solr-concepts.html
First to answer your question, you cannot keep your data in HBase/HDFS and create an index in SOLR to search that data. SOLR will search its own index. Here is the concept:
Data stored in SOLR is called documents (an analogy from database world is that each document is a row in a table). Before you can store data in SOLR, you will have to define a schema in a file called schema.xml (similar to a table schema in a database). This is where you specify whether your field (think like a column in a database) is indexed as well as stored. I know you understand index which is what SOLR uses to search. Bu what the hell is "stored". Well, are you only going to get back the indexed fields? Assume a document with 50 fields. May be you want to search only on 5 of the fields. And when you get the result back of your search, you probably want more than the indexed field. So you get back your stored fields. The more fields you store and index, the higher storage requirements.
Read that link and you'll have a good idea. And to reiterate my earlier point, no, you cannot have data in HDFS/HBase and index from SOLR. SOLR is a complete solution. SOLR can use HDFS to store and index its own data, but it's not going to create an index on your HBase file or your ORC/Text etc files on HDFS.
Created 02-03-2017 05:29 PM
I would recommend reading the following link:
http://www.solrtutorial.com/basic-solr-concepts.html
First to answer your question, you cannot keep your data in HBase/HDFS and create an index in SOLR to search that data. SOLR will search its own index. Here is the concept:
Data stored in SOLR is called documents (an analogy from database world is that each document is a row in a table). Before you can store data in SOLR, you will have to define a schema in a file called schema.xml (similar to a table schema in a database). This is where you specify whether your field (think like a column in a database) is indexed as well as stored. I know you understand index which is what SOLR uses to search. Bu what the hell is "stored". Well, are you only going to get back the indexed fields? Assume a document with 50 fields. May be you want to search only on 5 of the fields. And when you get the result back of your search, you probably want more than the indexed field. So you get back your stored fields. The more fields you store and index, the higher storage requirements.
Read that link and you'll have a good idea. And to reiterate my earlier point, no, you cannot have data in HDFS/HBase and index from SOLR. SOLR is a complete solution. SOLR can use HDFS to store and index its own data, but it's not going to create an index on your HBase file or your ORC/Text etc files on HDFS.
Created 02-03-2017 09:05 PM
@mqureshi made great points. Also, note that you do not have to store any fields in Solr. You can choose True for either or both: stored=true/false, indexed=true/false. Of course if stored=false, you won't see the value in results but you will at a minimum, see the "uniqueKey" which would be your "id" field. You could also look at the HBase Indexer: https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-%2023....
Created 07-07-2020 04:34 AM
Solr includes the specified file terms in an index.
Indexing in Solr would be similar to creating an index at the end of a book that includes the words that appear in that book and their location, so basically we would take an inventory of the words that appear in the book and an inventory of the pages where said words appear
That is, by including content in the index, we make said content available for search by Solr.
This type of index, called an inverted index, is a way of structuring the information that will be retrieved by a search engine.
You may find a longer answer of the way the information is stored and retrieved by solr in https://www.solr-tutorial.com/indexing-with-solr.html