Member since
07-08-2013
6
Posts
1
Kudos Received
0
Solutions
07-12-2018
06:16 AM
Syncronizing data between clusters can be accomplished via distcp, BDR, or ingesting data into both clusters simulatenously using 3rd party tools. The best tool depends on your use case, risk tolerance, and budget. We don't recommend spanning clusters across large geographic regions (e.g. US to EU); network latency and bandwidth are usually not suitable and could easily result in the slow query times you're experiencing. We DO support spanning clusters across AWS Availability Zones if certain conditions are met; see Appendix A of Cloudera Enterprise Reference Architecture for AWS Deployments (PDF) details. For comparison, the latency between AWS AZs is typically sub-millisecond. Spanning bare metal clusters across multiple data centers will be addressed in the next release of Cloudera Enterprise Reference Architecture for Bare Metal Deployments (PDF), to coincide with C6. It will look similar to the AWS guidance, but with the additional caveat that network latency between sides should not exceed 10ms. Kudu does not support rack awareness. Not all services provide HA.
... View more