Member since
06-07-2016
923
Posts
322
Kudos Received
115
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3993 | 10-18-2017 10:19 PM | |
4255 | 10-18-2017 09:51 PM | |
14634 | 09-21-2017 01:35 PM | |
1773 | 08-04-2017 02:00 PM | |
2358 | 07-31-2017 03:02 PM |
03-20-2017
09:39 PM
1 Kudo
@G P You can use nested types in Hive with ORC (struct, lists, maps). This way you don't have to explode each record into multiple rows. This also gives you the benefit in denormalization as the data that is part of same array will be physically sitting together on the disk - so it has read benefits.
... View more
03-17-2017
10:13 PM
1 Kudo
@Michael Young
One last question. We have a Prod SOLR cluster using HDFS as file system. Assume following two scenarios: 1. SOLR is also running on DR. When we replicate data to DR using Snapshot/Disctcp combo, how does DR SOLR know which data belong to which index? I am guessing it doesn't. So in that case, how do we manage that? 2. SOLR is not running on DR. We replicate the data to DR. Some issue occurs in production and now we need to restore data back to Prod. Can we restore only some indexes? If yes, how is it possible since DR doesn't have any SOLR and for DR its simply some HDFS data.
... View more
Labels:
- Labels:
-
Apache Solr
03-17-2017
09:37 PM
Thanks @Michael Young I'll ask customer to look into ReplicationHandler. In addition to that, when you say "use standard filesystem methods", it means in this case HDFS Distcp because SOLR is running on top of HDFS. Is that right?
... View more
03-17-2017
01:54 AM
I have a customer who is running SOLR 4.10.3. Is there a cross data center replication mechanism available for for this version? If not, what is the best practice to keep DR in sync.
... View more
Labels:
- Labels:
-
Apache Solr
03-10-2017
06:29 PM
@suresh krish where is the error? I don't see any error. Just the details of your config.
... View more
03-10-2017
05:03 AM
2 Kudos
@milind pandit It would require lot of work, most of which would be similar to what has already been implemented. Since this is open source, you can fork and add your features or work with the community to add support for other cloud platform.
https://github.com/sequenceiq/cloudbreak
... View more
03-07-2017
08:14 PM
1 Kudo
@Kumar Veerappan Your idea is reasonable given there is no supported HA for Ambari. Following JIRA remains unresolved. https://issues.apache.org/jira/browse/AMBARI-17126 But what if the external monitoring tool that's monitoring Ambari goes down? Would you have another tool monitoring your monitoring Ambari tool? See the problem? But you can use an external monitoring tool like Upstart or Supervisor to monitor Ambari and then do a failover to a standby server. Please see the following link for more details on how to achieve this: https://community.hortonworks.com/questions/402/how-to-setup-high-availability-for-ambari-server.html
... View more
03-07-2017
02:38 PM
@Chris Lenz This should be possible using JOLTJsonTransform processor. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.JoltTransformJSON/index.html It helps you flatten your json. Output still remains json. It's a little more complex to learn so might take a day or two to understand all of it but its very handy. You should look at both the links below: https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9680451c_043 https://github.com/bazaarvoice/jolt
... View more
03-07-2017
12:53 AM
@Oriane Glad it was helpful. If you are satisfied with the answer, please accept it.
... View more
03-06-2017
02:12 AM
@Yan Liu Depending on how you write the two queries, yes, absolutely, they should give you the same results. As for LIMIT clause, you can add LIMIT clause to your DISTRIBUTE BY SORT BY query at the end just like you would in the ORDER BY query.
... View more