I have a CDH installation, where I'm using Cloudera Search on 11 nodes.
I'd want to have a replicated collection with many shards, as many as I can distribute among the nodes.
For 11 nodes, I figured a setup of 5 shards and replication 2 would do. It would leave one node without a shard, but that is fair enough since 11 is a prime number. Maybe not best number choice but I'll have to live with it.
However, I'm finding that if I'm querying that collection on that node without the shard of that collection, it does not recognize the collection. Is this expected? I'd expect it would coordinate with the rest of the nodes via ZK to get the correct response, regardless where the data is.
I'm struggling to see the best way to access SolrCloud from an application. I'd like to have a unique point of contact so applications can direct their queries to one service (one service URL instead of e.g. randomizing connections from a predefined set of servers). And also, knowing that I may be connecting to a node that does not have a shard and that it would make the query fail, it is not comforting either.