About david_lays

josemartinezs · ‎10-04-2019

Hello, I'm looking your answer 3 years later because I'm in a similar situation :). In my company (telco) we're planning using 2 hot clusters with dual ingest because our RTO is demanding and we're looking for mechanisms to monitor and keep in sync both clusters. We ingest data in real-time with kafka + spark streaming, loading to HDFS and consuming with Hive/Impala. I'm thinking about a first approach making simple counts with Hive/Impala tables on both clusters each hour/half hour and comparing. If something is missing in one of the clusters, we will have to "manually" re-ingest the missing data (or copy it with cloudera BDR from one cluster to the other) and re-process enriched data. I'm wondering if have you dealt with similar scenarios or suggestions you may have. Thanks in advance!

Hitaay · ‎10-08-2016

@David Lays Please let me know what final Kafka design approach you went with; Kafka on Cluster node or separate Kafka cluster. We are also facing exactly same design dilemma with regards to Kafka installation for Cluster. Thanks very much in advance.

Online	Offline
Last Visited	‎10-03-2017 05:05 AM

Member Since	‎04-26-2016 12:24 AM
Last Visited	‎10-03-2017 05:05 AM
Posts	6
Kudos received	3

Cloudera Community

Re: HDFS replication for DR

Re: Kafka cluster design