About mqureshi

mqureshi · ‎03-20-2017

@G P You can use nested types in Hive with ORC (struct, lists, maps). This way you don't have to explode each record into multiple rows. This also gives you the benefit in denormalization as the data that is part of same array will be physically sitting together on the disk - so it has read benefits.

mqureshi · ‎03-17-2017

@Michael Young One last question. We have a Prod SOLR cluster using HDFS as file system. Assume following two scenarios: 1. SOLR is also running on DR. When we replicate data to DR using Snapshot/Disctcp combo, how does DR SOLR know which data belong to which index? I am guessing it doesn't. So in that case, how do we manage that? 2. SOLR is not running on DR. We replicate the data to DR. Some issue occurs in production and now we need to restore data back to Prod. Can we restore only some indexes? If yes, how is it possible since DR doesn't have any SOLR and for DR its simply some HDFS data.

mqureshi · ‎03-17-2017

Thanks @Michael Young I'll ask customer to look into ReplicationHandler. In addition to that, when you say "use standard filesystem methods", it means in this case HDFS Distcp because SOLR is running on top of HDFS. Is that right?

mqureshi · ‎03-17-2017

I have a customer who is running SOLR 4.10.3. Is there a cross data center replication mechanism available for for this version? If not, what is the best practice to keep DR in sync.

mqureshi · ‎03-10-2017

@suresh krish where is the error? I don't see any error. Just the details of your config.

mqureshi · ‎03-10-2017

@milind pandit It would require lot of work, most of which would be similar to what has already been implemented. Since this is open source, you can fork and add your features or work with the community to add support for other cloud platform. https://github.com/sequenceiq/cloudbreak

mqureshi · ‎03-07-2017

@Kumar Veerappan Your idea is reasonable given there is no supported HA for Ambari. Following JIRA remains unresolved. https://issues.apache.org/jira/browse/AMBARI-17126 But what if the external monitoring tool that's monitoring Ambari goes down? Would you have another tool monitoring your monitoring Ambari tool? See the problem? But you can use an external monitoring tool like Upstart or Supervisor to monitor Ambari and then do a failover to a standby server. Please see the following link for more details on how to achieve this: https://community.hortonworks.com/questions/402/how-to-setup-high-availability-for-ambari-server.html

mqureshi · ‎03-07-2017

@Chris Lenz This should be possible using JOLTJsonTransform processor. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.JoltTransformJSON/index.html It helps you flatten your json. Output still remains json. It's a little more complex to learn so might take a day or two to understand all of it but its very handy. You should look at both the links below: https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9680451c_043 https://github.com/bazaarvoice/jolt

mqureshi · ‎03-07-2017

@Oriane Glad it was helpful. If you are satisfied with the answer, please accept it.

mqureshi · ‎03-06-2017

@Yan Liu Depending on how you write the two queries, yes, absolutely, they should give you the same results. As for LIMIT clause, you can add LIMIT clause to your DISTRIBUTE BY SORT BY query at the end just like you would in the ORDER BY query.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Parsing JSON data and Storing it in ORC Hive

HDFS Replication for SOLR 4.10.3

Re: SOLR Cross Data Center Replication version 4.1...

SOLR Cross Data Center Replication version 4.10.3

Re: kerbores not started

Re: Cloudbreak support with additional cloud vendo...

Re: Monitoring Ambari server

Re: JSONtoSQL with JSON containing an array with a...

Re: Do you know something like kaggle for Hadoop

Re: Hive Order By with Limits query performance o...