Created 02-24-2016 02:45 PM
Hi All,
We are looking forward to setup HDP on Azure to start with 18 node cluster. Mostly with A-8 to A-11 type instances.
I am looking for setup instruction and best practices.
I went through the following documents but I see they are a bit out dated and most of them tackle with sandbox.
Azure Sandbox - Hortonworks Blog
Deploying Sandbox in Azure - Hortonworks Tutorial
Azure Using Cloud break (We are not looking forward to use CloudBreak and Ambari Blue Prints)
Deploying Hortonworks HDP on Microsoft Azure Video
Step by step guide - TechNet blog
Few specific questions :
- Should we place each of the node in a separate cloud service or all nodes in a single cloud service?
- Should we setup HDP on all the nodes (18 nodes) and setup a Windows Server with Active directory for name resolution as show in the TechNet blog? Can we not have HDP cluster running without jumpbox?
- Do we need to build a separate VPC?
- Is there any documentation in place which talks about step by step setting up?
- What are the best practices around?
Created 02-24-2016 02:52 PM
@Smart Solutions steps are the same as any other HDP install. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Security_Guide/content/security-intro.htm...
Created 02-24-2016 02:52 PM
@Smart Solutions steps are the same as any other HDP install. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Security_Guide/content/security-intro.htm...
Created 02-24-2016 02:56 PM
@Smart Solutions I've used this step by step guide to setup HDP on Azure before, it is dated but still works http://blogs.technet.com/b/oliviaklose/archive/2014/07/02/hadoop-on-linux-on-azure-step-by-step-inst...
Created 02-24-2016 02:59 PM
Yes, I've used it too. However my concern is : Should we keep all the nodes in a single cloud service ? or each node in separate cloud service? In the blog it's shown to be in a single service.
Thanks @Artem Ervits
Created 02-24-2016 02:56 PM
Thanks @Artem Ervits for the security doc. However I still have other questions which are very specific to Azure. We already have HDP deployed on premises now want to extend it to Azure.
Created 02-24-2016 03:16 PM
here are some more considerations @Smart Solutions
all Vms in one Availability Set are in the same fault domain. That is, power disruptions and other things will affect all of them at once. For HDP, it means that every VM running the Master services must be place in its own Availability Set. If you have 5 Master nodes, put each of them in their own AS. Zookeeper is a master service, it should be spread across the 5 masters for quorum, and because they are in different AS, they will be guarded from abrupt shutdown disrupting the whole cluster.
Created 02-24-2016 03:17 PM
@Smart Solutions even more considerations
Use premium storage if possible: Azure premium storage are in essence SSD and provide great performance. Premium storage gives 5,000 IOPS per disk rather than the regular storage that gives just 500 IOPS per disk. VPN Tunnel: VPN tunneling allows dedicated network connection between on premise network to azure cloud network. This allows for integration with on-prem LDAP/AD. Need to ensure IP Addresses from Azure cluster doesn’t overlap with on-prem IPs Static IPs: When rebooting the servers need to sure that Azure VMs are pinned to the same IP Address again (and nor randomly reassigned). Make sure all the instances are in the same Azure Virtual Network. By default, VMs connected to the same VNET have full connectivity between them
Created 02-24-2016 03:17 PM
Azure cloud service: 1 Cloud Service means single IP Address interface for the entire cluster.Potential issue if you need to use the same port number for different services than you need to create multiple cloud services (e.g. Need to have Ambari and HUE both on port 8080 for the external IP) Mount points: All the mount point that has been added to master and compute nodes should not have “noexec" and the flags should be set to default. This is because if we set the mount point to “noexec” then YARN won’t be able to execute on those disks. Storage Accounts: Use one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account.
Created 02-24-2016 03:43 PM
Created 02-24-2016 03:46 PM
- Should we place each of the node in a separate cloud service or all nodes in a single cloud service?
No
- Should we setup HDP on all the nodes (18 nodes) and setup a Windows Server with Active directory for name resolution as show in the TechNet blog? Can we not have HDP cluster running without jumpbox?
This will work. HDP will be in all the nodes and you can have AD in one of the nodes
- Do we need to build a separate VPC?
Yes for security reasons.
- Is there any documentation in place which talks about step by step setting up?
Yes ..see that official blog
- What are the best practices around?