Support Questions

Spranker · ‎02-15-2017

Hi,

I would like to know the exact count of the documents that match a solr query, but when I try to execute it two times in a row I get a different number in the "numFound" field.

Therefore, I am wondering if "numFound" is an exact value or an estimate and if there is a better way to find the exact number of documents that match a query.

Thank you.

pdvorak · ‎02-15-2017

numFound is the number that should be returned each time. If it is different, there are a couple of possibilities:

1. You are indexing in real time, so the numFound would keep increasing, or if using the lily hbase indexer, docs could be deleted.

2. Your replicas for a given shard are out of sync. You can find out if this is the case by sending the same query to each replica in the shard, and add the following property to the URL string: distrib=false

http://solr.server/solr/collection1_shard1_replica1/select?q=*:*&distrib=false

http://solr2.server/solr/collection1_shard1_replica2/select?q=*:*&distrib=false

If that returns different results and you aren't doing real time indexing, then there is likely an issue, and you can do DELETEREPLICA and ADDREPLICA to recreate they synced replica:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_solr_service.html#id_s15_n33_45

-pd

Spranker · ‎02-16-2017

Thank you for your answer.

I checked and I have some recovering shards, but if I try to make a query on a collection that only has active shards the "numFound" field does not change. Therefore, if I fix this issue the "numFound" field should contain the exact document count and not some estimate, correct?

I'd have one more question: if I make a query from a hue app that uses Cloudera Search, the number near the arrow that allows you to change page is the exact document count from the "numFound" field?

Thank you.

Paul Yang · ‎12-21-2017

HI,

I meet the same issue, maybe is another. I do reltime indexer for hbase with Clouder search lily indexer. There are a collection with 3 shards and 6 replications . Fortunately, It is correct that the numfound of index documents when I query . After the reltime indexer executing. I do index with hbase-indexer-mr-1.5-*-job.jar , and the issue come on. the numfound of query become very strange. the numfound is 80.

the numfound should be 40, since the hbase table is just 40 rows.

the below is result fo query that I do as your metioned.

http://oddev05.dev1.fn:8983/solr/test_lily_solr_shard1_replica2/select?q=*:*&distrib=false     <result name="response" numFound="28" start="0">
http://oddev03.dev1.fn:8983/solr/test_lily_solr_shard1_replica1/select?q=*:*&distrib=false     <result name="response" numFound="28" start="0">
http://oddev03.dev1.fn:8983/solr/test_lily_solr_shard2_replica2/select?q=*:*&distrib=false     <result name="response" numFound="28" start="0">
http://oddev04.dev1.fn:8983/solr/test_lily_solr_shard2_replica1/select?q=*:*&distrib=false     <result name="response" numFound="28" start="0">
http://oddev04.dev1.fn:8983/solr/test_lily_solr_shard3_replica2/select?q=*:*&distrib=false     <result name="response" numFound="24" start="0">
http://oddev05.dev1.fn:8983/solr/test_lily_solr_shard3_replica1/select?q=*:*&distrib=false     <result name="response" numFound="24" start="0">
*the total of the numfound of the 3 shards is 80 *	

http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*&start=0&rows=100                  <result name="response" numFound="40" start="0" maxScore="1.0">  *correct* 
http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*&start=0&rows=10                   <result name="response" numFound="80" start="0" maxScore="1.0">	 
http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*                                   <result name="response" numFound="80" start="0" maxScore="1.0">

I google got:

Preventing the problem is easy -- always index documents onto the correct shard.

I think maybe it's right. But how to index documents onto the correct and same shard with lily and mapreduce? I would like to know more keys.

Thank you

Paul Yang · ‎02-07-2018

HI Any update for the last question I cannot get the correctly numfound after i run:

HADOOP_OPTS="-Djava.security.auth.login.config=jaas.conf" \ hadoop --config /etc/hadoop/conf jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.11.2-job.jar --conf /etc/hbase/conf/hbase-site.xml -Dmapreduce.job.queuename=root.hadoop.plarch --hbase-indexer-zk oddev03:2181,oddev04:2181,oddev05:2181 --hbase-indexer-name onedata_order_orderIndexer --go-live

Is this a known issue? if so, how to workaround? if no, how to correct the above command line?

Thanks for your reply.

BR

Paul

Cloudera Community

Support Questions

Solr query document count