Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Scaling and Auto-scaling of HDP on AWS and Azure Cloud

avatar
Expert Contributor

My understanding along with questions as below,

AWS-HDCloud

Manual scaling using Ambari or AWS UI possible.

Auto Scaling

1. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?

1.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.

--------------------------------------------------------------------------------------------------------------------------------------------------------------

AWS-HDP on IaaS

Manual scaling using Ambari is possible.

Auto Scaling-Without CloudBreak

2. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?

2.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.

Auto Scaling-WithCloudBreak

Auto-scaling may be possible, but question 2.1 applies here as well.

--------------------------------------------------------------------------------------------------------------------------------------------------------------

Azure-HdInsights

Manual scaling using Ambari or Azure UI possible.

Auto Scaling

3. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?

3.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.

--------------------------------------------------------------------------------------------------------------------------------------------------------------

Azure-HDP in MarketPlace

Manual scaling using Ambari or Azure UI possible.

Auto Scaling

4. Is it possible to auto-scale in this option (while creating the cluster can i set auto-scaling group)?

4.1. In that case, how is the data re-balanced? i.e. if a new node is added, then compute may not gain data locality.

--------------------------------------------------------------------------------------------------------------------------------------------------------------

Azure-HDP on IaaS

Same questions as AWS-HDP on IaaS

1 ACCEPTED SOLUTION

avatar

@learninghuman

To state it most simply, auto-scaling is a capability of Cloudbreak only at this point in time. With Cloudbreak Periscope, you can define a scaling policy and apply it to any Alert on any Ambari Metric. Scaling granularity is at the Ambari host group level. This provides you the option to scale services or components only, not the whole cluster. Per your line of questioning above, if you use Cloudbreak to provision HDP on either Azure IaaS or AWS IaaS, you can use the auto-scaling capabilities it provides. Both Azure HDInsight (HDI) and Hortonworks Data Cloud for AWS (HDC) make it very easy to manually re-size your cluster through their respective consoles. Auto-scaling is not a feature of either offering at this point in time.

In regards to data re-balancing, neither HDI nor HDC need to be concerned with this, because they are both automatically configured to use Cloud Storage (currently ADLS and S3 respectively) - not HDFS. For HDP deployed on IaaS with Cloudbreak, auto-scaling may potentially perform a HDFS rebalance - but only after a Downscale operation. In order to keep a healthy HDFS during downscale, Cloudbreak always keeps the replication factor configured and makes sure that there is enough space on HDFS to rebalance data. During downscale, in order to minimize the rebalancing, replication, and HDFS storms, Cloudbreak checks block locations and computes the least costly operations.

View solution in original post

2 REPLIES 2

avatar

@learninghuman

To state it most simply, auto-scaling is a capability of Cloudbreak only at this point in time. With Cloudbreak Periscope, you can define a scaling policy and apply it to any Alert on any Ambari Metric. Scaling granularity is at the Ambari host group level. This provides you the option to scale services or components only, not the whole cluster. Per your line of questioning above, if you use Cloudbreak to provision HDP on either Azure IaaS or AWS IaaS, you can use the auto-scaling capabilities it provides. Both Azure HDInsight (HDI) and Hortonworks Data Cloud for AWS (HDC) make it very easy to manually re-size your cluster through their respective consoles. Auto-scaling is not a feature of either offering at this point in time.

In regards to data re-balancing, neither HDI nor HDC need to be concerned with this, because they are both automatically configured to use Cloud Storage (currently ADLS and S3 respectively) - not HDFS. For HDP deployed on IaaS with Cloudbreak, auto-scaling may potentially perform a HDFS rebalance - but only after a Downscale operation. In order to keep a healthy HDFS during downscale, Cloudbreak always keeps the replication factor configured and makes sure that there is enough space on HDFS to rebalance data. During downscale, in order to minimize the rebalancing, replication, and HDFS storms, Cloudbreak checks block locations and computes the least costly operations.

avatar

@learninghuman If this answer helps, please accept it. Otherwise, I'd be happy to answer any remaining questions you have.

Thanks! _Tom