Support Questions
Find answers, ask questions, and share your expertise

HDP advantages on Hybrid Cloud

HDP advantages on Hybrid Cloud

Contributor

The table attached from the link mentions about 4 factors - Scalability, Elasticity, protection of sensitive data and unlimited storage. Few questions on how this is achived

First and foremost, my understanding of Hybrid options are the following. Correct me as appropriate

1. With HDP deployed on OpenStack using CloudBreak OnPremise. The cluster can autoscale to cloud based on set metrics.

2. With HDP on-premise (no OpenStack...so no CloudBreak) plus nodes from VPC in AWS or VN in Azure added to the same cluster.

Now in option 1,

How are the 4 factors mentioned above achieved? With data on-premise, if the cluster auto scales, then the newly added compute nodes may not achieve data locality (performance is impacted). If the HDFS rebalance is done, then the sensitive data is compromised.

How do you ensure that the sensitive data is kept on-premise and never moved to cloud? This means that the processing cannot happen in cloud on the data on-premise. How is this enforced?

Option 2,

How are the 4 factors mentioned above achieved?

Is auto-scaling possible in this option? If not manual intervention is required to scale the cluster?

How do you ensure that the sensitive data is kept on-premise and never moved to cloud? This means that the processing cannot happen in cloud on the data on-premise. How is this enforced?

5 REPLIES 5

Re: HDP advantages on Hybrid Cloud

Super Guru
@learninghuman

I think there is a bit of confusion. Hybrid cloud in the paper you have linked does not mean you can extend your existing on-prem cluster to cloud. Details on page 5 and 6 describes how the architecture will work. For example:

Interoperability

A hybrid cloud must provide full interoperability and platform deployment choices across a variety of operating systems and infrastructure platforms. A robust hybrid cloud architecture is platform agnostic and in this way, the enterprise is assured of seamless business continuity no matter which environments or platforms they are working with. Enterprise Hadoop platforms should facilitate hybrid cloud deployment models with features that enable tethering. Tethering features allow Hadoop clusters to exchange data and workloads between on-premises and cloud deployments in either a manual or automated fashion but ultimately in a fully automated and seamless manner. This drives seamless interoperability beyond operating system and software version compatibility and eases the job of the operations manager.

On other places also in the paper, you can find additional details on this. For example, on page 3 under Introduction to Hybrid Cloud, you can find the following:

Hybrid clouds ease the process of migrating workloads from on-premises to the cloud. Data can be replicated across both the public cloud and on-premises to ensure business continuity. Enterprises can more efficiently deal with peak performance periods by allocating resources in the public cloud when needed without having to make additional investments in their on premises infrastructure.

Your options 1 and 2 assume extending an on premise cluster to cloud elastically. That is not possible.

Re: HDP advantages on Hybrid Cloud

Contributor

@mqureshi Thanks. As per link1 and link2 , option 1 i.e. if HDP is deployed using Cloudbreak on-premise with OpenStack, then it can scale to cloud (same cluster) may be possible. Please clarify

One more question, Can i create a one single HDP cluster with few nodes on-premise and few nodes from VPC in AWS or VN in Azure? I am not talking about elasticity or auto scaling here but but just want to if i can create a VPC or VN and then create a cluster with 10 nodes on-premise and 20 nodes from VPC or VN.

Re: HDP advantages on Hybrid Cloud

Super Guru

Greg Keys gave a really good detailed answer on link 2. I am not sure how Periscope data comes into play. I think it's a SQL tool. Not sure how it's relevant to deployment question.

Like Greg mentioned if you have open stack in house, then you can use Cloudbreak to deploy your clusters and scale up or down.

One thing I want to make sure is clear is that you cannot expand your cluster in multiple data centers. That is not supported today and I don't know of anyone who is doing this today.

Re: HDP advantages on Hybrid Cloud

Contributor

@mqureshi I thought Periscope helps to set threshold metrics for auto-scaling. Its now integrated with CloudBreak so that HDP deployed using CloudBreak on AWS, Azure, OpenStack etc can auto-scale. Thats my understanding atleast.

Re: HDP advantages on Hybrid Cloud

Super Guru

@learninghuman

That might be true. I am not very familiar with Periscope. Regardless, a cluster will not expand multiple data centers. And even if you make it work (which shouldn't be hard), it will not be supported.