- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solr query document count
- Labels:
-
Apache Solr
-
Cloudera Search
Created on ‎02-15-2017 02:08 PM - edited ‎09-16-2022 04:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I would like to know the exact count of the documents that match a solr query, but when I try to execute it two times in a row I get a different number in the "numFound" field.
Therefore, I am wondering if "numFound" is an exact value or an estimate and if there is a better way to find the exact number of documents that match a query.
Thank you.
Created ‎02-15-2017 02:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. You are indexing in real time, so the numFound would keep increasing, or if using the lily hbase indexer, docs could be deleted.
2. Your replicas for a given shard are out of sync. You can find out if this is the case by sending the same query to each replica in the shard, and add the following property to the URL string: distrib=false
http://solr.server/solr/collection1_shard1_replica1/select?q=*:*&distrib=false
http://solr2.server/solr/collection1_shard1_replica2/select?q=*:*&distrib=false
If that returns different results and you aren't doing real time indexing, then there is likely an issue, and you can do DELETEREPLICA and ADDREPLICA to recreate they synced replica:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_solr_service.html#id_s15_n33_45
-pd
Created ‎02-16-2017 01:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answer.
I checked and I have some recovering shards, but if I try to make a query on a collection that only has active shards the "numFound" field does not change. Therefore, if I fix this issue the "numFound" field should contain the exact document count and not some estimate, correct?
I'd have one more question: if I make a query from a hue app that uses Cloudera Search, the number near the arrow that allows you to change page is the exact document count from the "numFound" field?
Thank you.
Created on ‎12-21-2017 06:44 PM - edited ‎12-21-2017 10:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI,
I meet the same issue, maybe is another. I do reltime indexer for hbase with Clouder search lily indexer. There are a collection with 3 shards and 6 replications . Fortunately, It is correct that the numfound of index documents when I query . After the reltime indexer executing. I do index with hbase-indexer-mr-1.5-*-job.jar , and the issue come on. the numfound of query become very strange. the numfound is 80.
the numfound should be 40, since the hbase table is just 40 rows.
the below is result fo query that I do as your metioned.
http://oddev05.dev1.fn:8983/solr/test_lily_solr_shard1_replica2/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev03.dev1.fn:8983/solr/test_lily_solr_shard1_replica1/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev03.dev1.fn:8983/solr/test_lily_solr_shard2_replica2/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev04.dev1.fn:8983/solr/test_lily_solr_shard2_replica1/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev04.dev1.fn:8983/solr/test_lily_solr_shard3_replica2/select?q=*:*&distrib=false <result name="response" numFound="24" start="0"> http://oddev05.dev1.fn:8983/solr/test_lily_solr_shard3_replica1/select?q=*:*&distrib=false <result name="response" numFound="24" start="0"> *the total of the numfound of the 3 shards is 80 * http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*&start=0&rows=100 <result name="response" numFound="40" start="0" maxScore="1.0"> *correct* http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*&start=0&rows=10 <result name="response" numFound="80" start="0" maxScore="1.0"> http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:* <result name="response" numFound="80" start="0" maxScore="1.0">
I google got:
Preventing the problem is easy -- always index documents onto the correct shard.
I think maybe it's right. But how to index documents onto the correct and same shard with lily and mapreduce? I would like to know more keys.
Thank you
Created on ‎02-07-2018 11:06 PM - edited ‎02-07-2018 11:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI Any update for the last question I cannot get the correctly numfound after i run:
HADOOP_OPTS="-Djava.security.auth.login.config=jaas.conf" \ hadoop --config /etc/hadoop/conf jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.11.2-job.jar --conf /etc/hbase/conf/hbase-site.xml -Dmapreduce.job.queuename=root.hadoop.plarch --hbase-indexer-zk oddev03:2181,oddev04:2181,oddev05:2181 --hbase-indexer-name onedata_order_orderIndexer --go-live
Is this a known issue? if so, how to workaround? if no, how to correct the above command line?
Thanks for your reply.
BR
Paul
