Community Articles

Find and share helpful community-sourced technical articles.
avatar

Microsoft Azure General Sizing Guidelines

  1. You need to size and price machine and storage separately.
  2. Use Linux VMS on Azure (not to be confused with the Ubuntu Beta offering on HDInsight)
  3. If performance is a must, especially with Kafka and Storm, use Premium storage not Standard. Make sure and request Premium Storage (see link below)
  4. Do not use A8 machines. Use either A10 or A11’s. A8 is backed by Infiniband which is more expensive and unnecessary for Hadoop
  5. Recommend D Series and the newer D_v2 Series for Solid State Drives if needed.
  6. For Premium Storage use DS_v2 Series
  7. It is recommended that Page Blob Storage is used for Hbase as opposed to Block Storage. See link below.
  8. Both options will need attached Blob Storage. The 382 GB local disk that comes with the VM is just for temp storage.
  9. For Blob Storage, it comes in 1023GB sizes. Each VM has a maximum number of Blob Storage that can be attached. Eg. A10 Vms can have a maximum of 16 * 1TB storage. See the following for more details:
  10. Use Availability sets for master and worker nodes
  11. Use one storage account for every node in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account.
  12. You can also try to use Azure Data Lake Store (with adl://) in order to check the performance on the new Azure service.
  13. You also need to remember the maintenance windows of every Azure region according to your customers: some regions could be a good choice for new service availability (e.g.: US East 2) but not from a maintenance point of view (especially for European customers)

---------------------------------------

Recommendation 1 - Best Compute performance for Batch and Real Time Use Cases

  1. For Head Master Nodes Use:
    1. Standard_D13_v2 (8 CPU, 56GB) or
    2. Standard_D5_v2 (16 CPU, 56 GB) OR
    3. Standard_D14_v2 (16 CPU, 112 GB)
  2. For Data Nodes Use:
    1. Standard_D14_v2 (16 CPU, 112 GB) or
    2. Standard_DS14_v2 (16 CPU, 112 GB with Premium Storage) or
    3. Standard_DS15_v2 (20 CPU, 140 GB with Premium Storage)
  3. If testing Kafka and Storm use Standard_DS13_v2, Standard_DS14_v2 or Standard_DS15_v2 with Premium Storage especially if performance is needed to meet SLAs
  4. Pros: CPU is 35% than D Series; Local SSD Disks; VMs cheaper per hour that A or D series.

Recommendation 2 - Good Compute performance

  1. Use Standard_D13 (8 CPU, 56GB) or Standard_D14 (16 CPU, 112 GB) for Head/ Master nodes and Standard_D14 (16 CPU, 112 GB) for Data Nodes
  2. If testing Kafka and Storm use Standard_DS13 (8 CPU, 56GB) or Standard_DS14 (16 CPU, 112 GB) with Premium Storage especially if performance is needed to meet SLAs
  3. Pros: 60% faster than A series; Local SSD Disks;
  4. Why pick this if it is slightly more expensive per hour than D_v2 Series

Recommendation 3 - Mostly for Batch performance

  1. Use A10 or A11 for Head/ Master nodes and A11 for Data Nodes
  2. Microsoft is pricing effectively so that you use the D-v2 Series

------------

Microsoft Links

9,492 Views
Comments
avatar
Rising Star

A new guide for deploying HDP on Azure was recently released. See http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_HDP_AzureSetup/content/ch_HDP_AzureSetup.....

avatar
Explorer

@Ancil McBarnett just wondering, are these specs not going against the fundamental design principles of scaling out? The specifications seems to be very high for me. I thought distributed applications should work well on commodity and cheap hardware specifications. I was of the view that cluster of machines with 8GB, 1TB, 4CPU will do a good job. However this was not the case after I set up a 8 node cluster in Azure, ran a job on 1TB of data. It took 8 hours. I posted a question about this today and I did tag you.

avatar
New Contributor

Microsoft certification has been at the top of the list, since from the beginning, Microsoft has introduced real technology, which is critical in the IT industry and can work efficiently. The Microsoft certification exam is the best, most suitable for professionals; they have work knowledge, and want to verify credentials to get a good position. The KillerDumps provides you with Microsoft Exam Dumps questions answers to help you eradicate your anxiety and make your work easier. Our exam materials provide you with self-study and self-assessment to help you address your weaknesses and track your progress.

avatar
Expert Contributor

Do you have latest recommendations? Most of our hadoop processing is on Hive/Tez and Spark.