Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Recommendations for Microsoft Azure HDP Deployment

avatar

UPDATED

- Converted into an article for more updated information. See

https://community.hortonworks.com/articles/22376/recommendations-for-microsoft-azure-hdp-deployment-...

1 ACCEPTED SOLUTION

avatar
Rising Star

I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.

You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers).

We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.

View solution in original post

9 REPLIES 9

avatar
Contributor

Decision about using Page blob vs Block blob can be bit more nuanced, at least, when it comes to using Azure Blob store for HDFS. This page provides good overview: https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Page_Blob_Support_and_Configuration.

avatar
Rising Star

I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.

You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers).

We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.

avatar
@Andrea D'Orio

Good point but I have a follow up question. How did you get past of storage account limit per subscription as it's limited to 100 Storage accounts per subscription? Did you get it increased from Microsoft or are you using multiple subscription for 100+ nodes cluster?

avatar
Rising Star

@Pardeep Yes, we had to ask Microsoft Azure Support to increase limits for both Cores and Storage Account. Look at this link: https://azure.microsoft.com/en-us/blog/azure-limits-quotas-increase-requests/

avatar
New Contributor

@Andrea D'Orio do you have any benchmarking data on Azure IaaS vs. PaaS that you would be willing to share?

avatar

DS14 with 10SSD disks would be what I will recommend for worker nodes. Blob storage would be good for cold backup but I don't see much value in using that for HDP workloads.

avatar

I've been told that the DV2 series is now recommended over the A series. Shall we update this? https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-size-specs/#standard-tier-...

avatar

Mind if we convert this to an Article and update together since no answer will be correct for more than a couple months?

avatar

@Sean Roberts

Good idea. Let me convert into an article