About james_jones

james_jones · ‎06-27-2016

Awesome. Thanks.

james_jones · ‎06-23-2016

@rbiswas, You may have read this but there's some good info here in what they describe as a "real world" production configuration using the new cross-data-center replication: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462 Since this feature only came out in 6.0 which was released less than 2 months ago, there's probably been limited production use. ALSO....Not a best practice, but since way before Solr Cloud existed, we used a brute force method of cross-data-center replication for stand-by Solrs with the magic of rsync. You can reliably use rsync to copy indexes as they are being updated, but there's a bit of scripting required. I have only done this in non-cloud environments, but I'm pretty sure it can be done in cloud as well. It is crude, but it worked for years and uses some of the great features of linux. Example script, run in crontab from the DR site nodes: #step 1 - create a backup first, assuming your current copy is good. cp -rl ${data_dir} ${data_dir}.BAK #step 2 - Now copy from the primary site status=1 while [ $status != 0 ]; do rsync -a --delete ${primary_site_node}:${data_dir} ${data_dir} status=$? done echo "COPY COMPLETE!" That script will create local backup (instantly via hard-links, not soft links) and then copies [only] new files and deletes files from DR that are have been deleted from Primary/remote. If files disappear during the rsync copy, it will copy again until nothing changes during the rsync. This can be run from crontab, but it does need a bit of bullet-proofing. Simple. Crude. It works.

james_jones · ‎06-21-2016

Note that Solr Cloud's replication is not intended to go across data centers due to volume of traffic and dependency on zookeeper ensembles. However, the recently released 6.x added a special replication to go across data centers. https://issues.apache.org/jira/browse/SOLR-6273, which is based on this description: http://yonik.com/solr-cross-data-center-replication/ Basically, this is a cross-cluster replication, which is different from the standard Solr Cloud's replication mechanism.

james_jones · ‎06-08-2016

@David Lam did this work for you?

james_jones · ‎06-04-2016

@David Lam I believe that would be pretty straight forward. 1) Write your SOAP client 2) Integrate it a custom class which extends UpdaterequestFactory in the processAdd(AddUpdateCommand) method. 3) Create your updateRequestProcessorChain in the solrconfig.xml 4) Add the chain to a requestHandler Look at the example here for the ConditionalCopyProcessorFactory but just make your SOAP call instead.: https://wiki.apache.org/solr/UpdateRequestProcessor In the processAdd, since you have access to the SolrInputDocument, you can get any field's value and you can also add a new field or child document, which is just another SolrInputDocument. Be aware that by doing this you could add a bottleneck to your update pipeline if the web service is slow or down and you could also add additional stress on the Solr server, which perhaps is also serving queries. It just depends on your load and other factors. Alternatively, you could annotate the SolrDocuments in your ingest pipeline, before they are sent to Solr to push the load elsewhere. If you have no control over the ingest pipeline, you may have to do it in the update request chain anyway.

james_jones · ‎05-24-2016

@Nasheb Ismaily, Double check your configuration. I know you already know this, but for the sake of a complete answer, here's how to configure FIFO. The capacity scheduler queues can be configured for fifo or fair based via Ambari's Yarn Queue Manager (top right button). The default is fifo. Via Ambari - Yarn Capacity Scheduler Queue configuration: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/section_create_configure_yarn_capacity_scheduler_queues.html Manually: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_yarn_resource_mgt/content/flexible_scheduling_policies.html Also, the Yarn Fair Scheduler can be configured for FIFO: https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html “schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf” or any class that extends”

james_jones · ‎05-19-2016

@Nasheb Ismaily, according to what I read, applications are in fifo order, according to the time of submission. If you submit them back to back very quickly, is it possible the timestamps are identical and arrived at the "same time"?

james_jones · ‎05-19-2016

I want to correct what I said above. On the client, kafka.consumer exists, but I was looking on the broker. And, the answer below is correct (newly named metric/Mbean is on the new consumer).

james_jones · ‎05-19-2016

@schintalapani - Thanks! That's what I needed to know!

james_jones · ‎05-18-2016

@schintalapani I believe it is the new am using the interface provided by this (http://kafka.apache.org/documentation.html#newconsumerapi). <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.9.0.0</version> </dependency> org.apache.kafka.clients.consumer.KafkaConsumer<String,String> However, I may be missing a fundamental point. I have mostly been looking at mbeans from the broker, not the client. I did run the client but still did not see kafka.consumer mbeans there either.

Online	Offline
Last Visited	‎11-22-2024 01:45 PM

Member Since	‎01-18-2016 02:01 PM
Last Visited	‎11-22-2024 01:45 PM
Posts	163
Kudos received	31

Cloudera Community

Re: Ambari SPN creation on remote AD

Re: Solr on HDF

Re: Wrong timezone in Ranger admin

Re: SOLR server connection refused

Re: Solr Configuration - Error uploading file

Re: Was the Hex Viewer for application/octet-strea...

Re: What are the best practices/guidelines for sol...

Re: What are the best practices/guidelines for sol...

Re: How to call a SOAP web service from within a s...

Re: How to call a SOAP web service from within a s...

Re: Yarn Queue Capacity Scheduling

Re: Yarn Queue Capacity Scheduling

Re: Missing kafka.consumer.* MBeans

Re: Missing kafka.consumer.* MBeans

Re: Missing kafka.consumer.* MBeans