Created on 02-15-2017 02:08 PM - edited 09-16-2022 04:06 AM
Hi,
I would like to know the exact count of the documents that match a solr query, but when I try to execute it two times in a row I get a different number in the "numFound" field.
Therefore, I am wondering if "numFound" is an exact value or an estimate and if there is a better way to find the exact number of documents that match a query.
Thank you.
Created 02-15-2017 02:15 PM
Created 02-16-2017 01:07 AM
Thank you for your answer.
I checked and I have some recovering shards, but if I try to make a query on a collection that only has active shards the "numFound" field does not change. Therefore, if I fix this issue the "numFound" field should contain the exact document count and not some estimate, correct?
I'd have one more question: if I make a query from a hue app that uses Cloudera Search, the number near the arrow that allows you to change page is the exact document count from the "numFound" field?
Thank you.
Created on 12-21-2017 06:44 PM - edited 12-21-2017 10:38 PM
HI,
I meet the same issue, maybe is another. I do reltime indexer for hbase with Clouder search lily indexer. There are a collection with 3 shards and 6 replications . Fortunately, It is correct that the numfound of index documents when I query . After the reltime indexer executing. I do index with hbase-indexer-mr-1.5-*-job.jar , and the issue come on. the numfound of query become very strange. the numfound is 80.
the numfound should be 40, since the hbase table is just 40 rows.
the below is result fo query that I do as your metioned.
http://oddev05.dev1.fn:8983/solr/test_lily_solr_shard1_replica2/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev03.dev1.fn:8983/solr/test_lily_solr_shard1_replica1/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev03.dev1.fn:8983/solr/test_lily_solr_shard2_replica2/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev04.dev1.fn:8983/solr/test_lily_solr_shard2_replica1/select?q=*:*&distrib=false <result name="response" numFound="28" start="0"> http://oddev04.dev1.fn:8983/solr/test_lily_solr_shard3_replica2/select?q=*:*&distrib=false <result name="response" numFound="24" start="0"> http://oddev05.dev1.fn:8983/solr/test_lily_solr_shard3_replica1/select?q=*:*&distrib=false <result name="response" numFound="24" start="0"> *the total of the numfound of the 3 shards is 80 * http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*&start=0&rows=100 <result name="response" numFound="40" start="0" maxScore="1.0"> *correct* http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:*&start=0&rows=10 <result name="response" numFound="80" start="0" maxScore="1.0"> http://oddev04.dev1.fn:8983/solr/test_lily_solr/select?q=*:* <result name="response" numFound="80" start="0" maxScore="1.0">
I google got:
Preventing the problem is easy -- always index documents onto the correct shard.
I think maybe it's right. But how to index documents onto the correct and same shard with lily and mapreduce? I would like to know more keys.
Thank you
Created on 02-07-2018 11:06 PM - edited 02-07-2018 11:22 PM
HI Any update for the last question I cannot get the correctly numfound after i run:
HADOOP_OPTS="-Djava.security.auth.login.config=jaas.conf" \ hadoop --config /etc/hadoop/conf jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-1.5-cdh5.11.2-job.jar --conf /etc/hbase/conf/hbase-site.xml -Dmapreduce.job.queuename=root.hadoop.plarch --hbase-indexer-zk oddev03:2181,oddev04:2181,oddev05:2181 --hbase-indexer-name onedata_order_orderIndexer --go-live
Is this a known issue? if so, how to workaround? if no, how to correct the above command line?
Thanks for your reply.
BR
Paul