Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

solr performance ( cloudera cdh4.7)

Highlighted

solr performance ( cloudera cdh4.7)

New Contributor

My use case 

 

I am having 20gb file per day. (pipe delimited text file)

I have indexed 90 days data (20 * 90 gb)

Record count - 5.5 billion

total  fields - 30

indexed fields - called number , calling number , time_key

All other fields i stored (as per schema.cml)

index size - 300gb

No of shards = 4

I used below method to index (org.apache.solr.hadoop.MapReduceIndexerTool)

 

============================================

 

hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.M apReduceIndexerTool --morphline-file $path/morphlines.conf –output -dir hdfs://MASTERNODE:8020/$path2 --go-live --zk-host MASTERNODE:2181/solr --collection COLLECTIONNAME --mappers 4 --reducers 12 hdfs://Masternode/path/asd.txt

 

 

===========================================

 

 

 

 

In My test bed i have 4 datanodes and 1 name node. (Test bed on cloudera 5.4.7)

each node has 256gb ram,Any performance increasing tips i should follow in solr ?

It took around 120 sec to get  3000 record out put in one search.(Range query based on time key ).But after first time querry , its getting cached and then if i executed again i m getting response less than 1 sec with larger records out put as well (10000 record out put also getting with in 1 sec)

 

 

 

Note that when retriving 10 - 20 records , then performance was good on firsttime it self.

 

 

 

Regards

Anushke

 

3 REPLIES 3

Re: solr performance ( cloudera cdh4.7)

Cloudera Employee

Have you tried specifying an auto warming query?  See here for more details: https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig#QuerySettingsinSolrCon...

Re: solr performance ( cloudera cdh4.7)

New Contributor

Thanks for your input gchanan

 

By the way if i'm changing my solrconfig.xml as per above input ,  do i need to re create collection and reload all data again ? (Because total data size is 5.5 billion and hard to reupload again )

or just editing solrconfig.xml and restart service is enough ?

Re: solr performance ( cloudera cdh4.7)

Contributor

Adding auto warming does not require a re-index.