Created on 08-04-201611:04 AM - edited 08-17-201910:57 AM
Joining Collections in SOLR (Part 1)
Sometimes you may want to inner join data from one solr connection to another. There is a facility to perform this action using a join query in SOLR. The easiest way to perform the join is by linking a single attribute from one collection to another attribute in another collection. This join works very well for standalone indexes, but does not work well for distributed indexes. To do this in a distributed index, we’ll perform that in part II of this article.
To demonstrate, let’s say we have two collections. Sales, which contains the amount of sales by region. And in the other collection called People, which has people categorized by their region and a flag if they are a manager. Let’s say our goal is to find all of the sales by manager. To do this, we will join the collections using region as our join key, and also filter the people data by if they are a manager or not.
Here is the filter query (fq) in solr on how to make this happen:
We'll populate it with data using the Solr Admin UI. Select the Sales core, then choose Documents. Document Type should be CSV, paste the values below into the text box and then click Submit Document. Very simple way to index sample data.
If you would like to run the same functionality using compounded join keys (i.e. 2 or more join keys). The best things to do is concatenate those keys on ingest to create a single join key.
Additionally, this functionality does not work with distributed indexes, i.e. multiple shards. If you try to attempt this on a distributed index with multiple shards, you’ll get the following error message:
In Conclusion: Joins between SOLR collections are useful but should be taken with caution. As you can see, this query only works with simple non-distributed collections. Additionally, you can only display the fields from the sales collection and not the people collection which is a total bummer. A more common practice is to pre-join the information before it’s indexed. For joining collections with multiple shards, you could also try to attempt this with Spark. Stay tuned on how to do this in Part II of this post.