Created on 10-23-2015 12:47 AM - edited 09-16-2022 02:45 AM
UPDATED
- Converted into an article for more updated information. See
Created 12-10-2015 10:06 PM
I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.
You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers).
We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.
Created 10-23-2015 10:32 PM
Decision about using Page blob vs Block blob can be bit more nuanced, at least, when it comes to using Azure Blob store for HDFS. This page provides good overview: https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Page_Blob_Support_and_Configuration.
Created 12-10-2015 10:06 PM
I think it's also a good starting point to use Availability sets for master nodes and worker nodes. Another good point is about using one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.
You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (expecially for european customers).
We also verified great differences between IaaS performance and PaaS (HDInsight) performance due to low read/write performance of the Blob Storage: with the first one (configured correctly) you can achieve best performance.
Created 12-15-2015 04:37 AM
Good point but I have a follow up question. How did you get past of storage account limit per subscription as it's limited to 100 Storage accounts per subscription? Did you get it increased from Microsoft or are you using multiple subscription for 100+ nodes cluster?
Created 12-15-2015 07:28 AM
@Pardeep Yes, we had to ask Microsoft Azure Support to increase limits for both Cores and Storage Account. Look at this link: https://azure.microsoft.com/en-us/blog/azure-limits-quotas-increase-requests/
Created 12-16-2015 12:22 PM
@Andrea D'Orio do you have any benchmarking data on Azure IaaS vs. PaaS that you would be willing to share?
Created 12-15-2015 11:21 PM
DS14 with 10SSD disks would be what I will recommend for worker nodes. Blob storage would be good for cold backup but I don't see much value in using that for HDP workloads.
Created 01-12-2016 03:41 PM
I've been told that the DV2 series is now recommended over the A series. Shall we update this? https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-size-specs/#standard-tier-...
Created 01-12-2016 06:53 PM
Mind if we convert this to an Article and update together since no answer will be correct for more than a couple months?
Created 03-10-2016 05:34 PM