Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Recommendations for Microsoft Azure HDP Deployment

Solved Go to solution

Recommendations for Microsoft Azure HDP Deployment

UPDATED

- Converted into an article for more updated information. See

https://community.hortonworks.com/articles/22376/recommendations-for-microsoft-azure-hdp-deployment-...

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Recommendations for Microsoft Azure HDP Deployment

Contributor

I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.

You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers).

We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.

9 REPLIES 9
Highlighted

Re: Recommendations for Microsoft Azure HDP Deployment

New Contributor

Decision about using Page blob vs Block blob can be bit more nuanced, at least, when it comes to using Azure Blob store for HDFS. This page provides good overview: https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Page_Blob_Support_and_Configuration.

Re: Recommendations for Microsoft Azure HDP Deployment

Contributor

I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.

You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers).

We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.

Re: Recommendations for Microsoft Azure HDP Deployment

@Andrea D'Orio

Good point but I have a follow up question. How did you get past of storage account limit per subscription as it's limited to 100 Storage accounts per subscription? Did you get it increased from Microsoft or are you using multiple subscription for 100+ nodes cluster?

Re: Recommendations for Microsoft Azure HDP Deployment

Contributor

@Pardeep Yes, we had to ask Microsoft Azure Support to increase limits for both Cores and Storage Account. Look at this link: https://azure.microsoft.com/en-us/blog/azure-limits-quotas-increase-requests/

Re: Recommendations for Microsoft Azure HDP Deployment

New Contributor

@Andrea D'Orio do you have any benchmarking data on Azure IaaS vs. PaaS that you would be willing to share?

Re: Recommendations for Microsoft Azure HDP Deployment

DS14 with 10SSD disks would be what I will recommend for worker nodes. Blob storage would be good for cold backup but I don't see much value in using that for HDP workloads.

Re: Recommendations for Microsoft Azure HDP Deployment

I've been told that the DV2 series is now recommended over the A series. Shall we update this? https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-size-specs/#standard-tier-...

Re: Recommendations for Microsoft Azure HDP Deployment

Mind if we convert this to an Article and update together since no answer will be correct for more than a couple months?

Re: Recommendations for Microsoft Azure HDP Deployment

@Sean Roberts

Good idea. Let me convert into an article