Member since
01-18-2016
164
Posts
32
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
137 | 01-14-2025 06:30 PM | |
1420 | 04-06-2018 09:24 PM | |
1451 | 05-02-2017 10:43 PM | |
3958 | 01-24-2017 08:21 PM | |
24177 | 12-05-2016 10:35 PM |
06-29-2016
04:54 PM
1 Kudo
We are enabling the Ranger authorization in Hive, but previously we created roles & grants in beeline. Should we remove manually created Hive grants and roles in beeline before switching from StdSqlAuth to Ranger authorization or will magic happen? The roles we created match our AD group names in the policies. (HDP 2.4)
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Ranger
06-27-2016
02:52 PM
Awesome. Thanks.
... View more
06-23-2016
03:20 AM
1 Kudo
@rbiswas, You may have read this but there's some good info here in what they describe as a "real world" production configuration using the new cross-data-center replication: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462 Since this feature only came out in 6.0 which was released less than 2 months ago, there's probably been limited production use. ALSO....Not a best practice, but since way before Solr Cloud existed, we used a brute force method of cross-data-center replication for stand-by Solrs with the magic of rsync. You can reliably use rsync to copy indexes as they are being updated, but there's a bit of scripting required. I have only done this in non-cloud environments, but I'm pretty sure it can be done in cloud as well. It is crude, but it worked for years and uses some of the great features of linux. Example script, run in crontab from the DR site nodes: #step 1 - create a backup first, assuming your current copy is good.
cp -rl ${data_dir} ${data_dir}.BAK
#step 2 - Now copy from the primary site
status=1
while [ $status != 0 ]; do
rsync -a --delete ${primary_site_node}:${data_dir} ${data_dir}
status=$?
done
echo "COPY COMPLETE!"
That script will create local backup (instantly via hard-links, not soft links) and then copies [only] new files and deletes files from DR that are have been deleted from Primary/remote. If files disappear during the rsync copy, it will copy again until nothing changes during the rsync. This can be run from crontab, but it does need a bit of bullet-proofing. Simple. Crude. It works.
... View more
06-21-2016
05:25 PM
1 Kudo
Note that Solr Cloud's replication is not intended to go across data centers due to volume of traffic and dependency on zookeeper ensembles. However, the recently released 6.x added a special replication to go across data centers. https://issues.apache.org/jira/browse/SOLR-6273, which is based on this description: http://yonik.com/solr-cross-data-center-replication/ Basically, this is a cross-cluster replication, which is different from the standard Solr Cloud's replication mechanism.
... View more
06-08-2016
12:41 PM
1 Kudo
@David Lam did this work for you?
... View more
06-04-2016
01:20 AM
@David Lam I believe that would be pretty straight forward. 1) Write your SOAP client 2) Integrate it a custom class which extends UpdaterequestFactory in the processAdd(AddUpdateCommand) method. 3) Create your updateRequestProcessorChain in the solrconfig.xml 4) Add the chain to a requestHandler Look at the example here for the ConditionalCopyProcessorFactory but just make your SOAP call instead.: https://wiki.apache.org/solr/UpdateRequestProcessor In the processAdd, since you have access to the SolrInputDocument, you can get any field's value and you can also add a new field or child document, which is just another SolrInputDocument. Be aware that by doing this you could add a bottleneck to your update pipeline if the web service is slow or down and you could also add additional stress on the Solr server, which perhaps is also serving queries. It just depends on your load and other factors. Alternatively, you could annotate the SolrDocuments in your ingest pipeline, before they are sent to Solr to push the load elsewhere. If you have no control over the ingest pipeline, you may have to do it in the update request chain anyway.
... View more
05-24-2016
06:57 PM
@Nasheb Ismaily, Double check your configuration. I know you already know this, but for the sake of a complete answer, here's how to configure FIFO. The capacity scheduler queues can be configured for fifo or fair based via Ambari's Yarn Queue Manager (top right button). The default is fifo. Via Ambari - Yarn Capacity Scheduler Queue configuration: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/section_create_configure_yarn_capacity_scheduler_queues.html Manually: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_yarn_resource_mgt/content/flexible_scheduling_policies.html Also, the Yarn Fair Scheduler can be configured for FIFO: https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html “schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf” or any class that extends”
... View more
05-19-2016
12:51 AM
@Nasheb Ismaily, according to what I read, applications are in fifo order, according to the time of submission. If you submit them back to back very quickly, is it possible the timestamps are identical and arrived at the "same time"?
... View more
05-19-2016
12:41 AM
I want to correct what I said above. On the client, kafka.consumer exists, but I was looking on the broker. And, the answer below is correct (newly named metric/Mbean is on the new consumer).
... View more