Reply
Explorer
Posts: 8
Registered: ‎11-29-2015

Is it possible to have hybrid deployment with AWS?

Let say I already have a cluster internally in my lab. Is it possible to add more hosts but from AWS to act as DataNode?

Cloudera Employee
Posts: 119
Registered: ‎10-13-2014

Re: Is it possible to have hybrid deployment with AWS?

[ Edited ]

Technically possible for sure with the proper networking setup that allows bi-directional communication between the hosts in your lab the ones in AWS. DNS should also work seamlesly across these two different networks.

 

What's your use case? Why not run the entire cluster on AWS? It's likely that the latency and the limited bandwith will significantly impact the performance of this hybrid cluster.

Explorer
Posts: 8
Registered: ‎11-29-2015

Re: Is it possible to have hybrid deployment with AWS?

I just want to see if it's possible to use public cloud to handle the spike. The data are already in the internal lab so moving them outside would cause some inconvenience, especially when I don't want to deal with that process of upload and then ingest into AWS
Latency is a big deal but I am wondering if there is any case that it would make sense. I am just thinking out loud here. For example, what if there are scenarios where the cluster sustaining a long spike for "Tier 1" processes. But I want to be able to add more computation for other "Tier 2" so it makes sense to add new nodes instead of waiting.
Cloudera Employee
Posts: 119
Registered: ‎10-13-2014

Re: Is it possible to have hybrid deployment with AWS?

With a hybrid topology as described it's highly unlikely that you will achive an acceptable level of performance while shufling data between the local environment and AWS. This is a guess - I don't have performance numbers to share. I would love to know more about your results if you get to try this out. 

New Contributor
Posts: 2
Registered: ‎06-26-2017

Re: Is it possible to have hybrid deployment with AWS?

Does anyone have an update on this use case?  We would be interested to know if this "hybrid / bursting to the AWS cloud" architecture is realistic.

Cloudera Employee
Posts: 45
Registered: ‎02-18-2014

Re: Is it possible to have hybrid deployment with AWS?

Hi aamato76,

 

Andrei's original take on the idea still holds true today, as far as we've seen. Cloudera's general testing of different cluster configurations has found that even splitting a cluster across availability zones, while having the whole cluster in AWS, still can lead to performance problems. Splitting across regions is worse, and is somewhat close to the hybrid architecture you're thinking about.

 

Here's Cloudera's reference architecture doc, by the way: http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf

 

A different take on the idea is to have a separate cluster in AWS that can take on the additional workload, and set up focused data transfers of workload data from on-prem up to the cloud cluster and of results back from the cloud cluster. Maybe there's a work allocation system fronting both clusters that can send jobs to the local cluster by default, but out to the cloud cluster when the local one is overburdened. This would avoid individual job performance problems and probably reduce the data transfer costs into and out of AWS (if you have a VPN gateway set up, then those costs might be irrelevant anyway).

 

So, hybrid architectures are realistic, but spanning single clusters between on-prem and cloud is not a great implementation.

New Contributor
Posts: 2
Registered: ‎06-26-2017

Re: Is it possible to have hybrid deployment with AWS?

Bill, thanks a lot. This is extremely helpful to us.
Announcements