Reply
Highlighted
Posts: 177
Topics: 8
Kudos: 28
Solutions: 19
Registered: ‎07-16-2015

solrctl issue (implentation issue ?)

[ Edited ]

Hi,

 

We are experiencing issue with solrctl usage since more than one year. And only recently we have dedicated time to investigate the issue.

 

Here is what we have found :

* solrctl uses too much memory when processing a command (it can use several GB of memory)

* solrctl takes too much time to fulfill a command (it can run a command for more than 1 minutes for listing collections / creating collection ...)

 

For the last year, we have encountered these two issues without investigating the root cause because ultimatly it was "working". But today, these two issues prevent us from doing some fine tuning of our yarn application (memory for map task) AND makes production intervention (like deploying new application) so long.

 

Also, note that the issue is worst by the time (and is linked to the number of collections hosted in the SolrCloud cluster).

 

Now, here is what we found inside the component that is causing these two issues.

For functionning properly, solrctl is relying on zookeeper in order to know the state and configuration of the SolrCloud cluster. And that is fine.

 

But what is not fine is the way it is implemented.

solrctl ask to zookeeper for the "whole" content of a znode (/solr) (including content of childs znode, including child of child of child ...) for loading it into a variable (into memory). Then solrctl use this variable to get the information it need (does solr use http or https, what is the list of collection, ...).

Meaning, that solrctl load the whole SolrCloud configuration including "instancedir" and "collections". Yes, solrctl load the configuration of each instancedir (schema.xml content, solrconfig.xml content, ...)

 

This approach works fine for a newly created SolrCloud cluster since the content of the /solr znode containing the SolrCloud state and configuration is small at the beginning.

But then, you will create collections into SolrCloud. And for each new collection solrctl will take more time and more memory.

 

Today, in production we have more than 300 collections.

Simply asking solrctl to list the collections is taking more than one minute.

"solrctl --zk <zk_quorym>/solr collection --list"

 

This particular command needs from zookeeper one thing :

* The list of collection hosted

But from the implementation, for getting that information solrctl load the whole /solr znode instead of getting only the list of collection. That is clearly not optimized.

 

And the same can be said about all the other command "collection --create", "instancedir --list", ...

 

Does someone also encounter the same problem ?

Is there some way to use solrctl in order to get reasonable response time ?

Isn't there a conception issue on that component ?

 

In our case, we have made a clone of the solrctl tool but we have replaced the part that is loading the whole zookeeper content by more specific access that is requesting only the needed information.

By doing that we have observed two things :

* our clone of solrctl is using less memory

* our clone of solrctl response time is less that 2 seconds for every command we have optimized.

 

 

 

 

 

 

 

 

 

 

 

Posts: 177
Topics: 8
Kudos: 28
Solutions: 19
Registered: ‎07-16-2015

Re: solrctl issue (implentation issue ?)

[ Edited ]

This is the sample of solrctl that is causing the issue (browse and get the whole znode content and childs) :

  if [ -z "$SOLR_STATE" ] ; then
    SOLR_STATE=`eval $SOLR_ADMIN_ZK_CMD -cmd list 2>/dev/null`
  fi

 

And SOLR_ADMIN_ZK_CMD is equal to this :

SOLR_ADMIN_ZK_CMD='ZKCLI_JVM_FLAGS=${ZKCLI_JVM_FLAGS} ${SOLR_HOME}/bin/zkcli.sh -zkhost $SOLR_ZK_ENSEMBLE 2>/dev/null'

 

solrctl is using the zkcli shipped with Solr (not the basic one shipped with zookeeper).

Posts: 177
Topics: 8
Kudos: 28
Solutions: 19
Registered: ‎07-16-2015

Re: solrctl issue (implentation issue ?)

[ Edited ]

For information I have raised this topic to Cloudera support and they acknowledge the optimization.

Seems like they are willing to include this into a future version of CDH.