Support Questions

Find answers, ask questions, and share your expertise

How to set up secure HDP production cluster on Azure?

avatar
Expert Contributor

Hi All,

We are looking forward to setup HDP on Azure to start with 18 node cluster. Mostly with A-8 to A-11 type instances.

I am looking for setup instruction and best practices.

I went through the following documents but I see they are a bit out dated and most of them tackle with sandbox.

Azure Sandbox - Hortonworks Blog

Deploying Sandbox in Azure - Hortonworks Tutorial

Azure Using Cloud break (We are not looking forward to use CloudBreak and Ambari Blue Prints)

Deploying Hortonworks HDP on Microsoft Azure Video

Step by step guide - TechNet blog

Few specific questions :

- Should we place each of the node in a separate cloud service or all nodes in a single cloud service?

- Should we setup HDP on all the nodes (18 nodes) and setup a Windows Server with Active directory for name resolution as show in the TechNet blog? Can we not have HDP cluster running without jumpbox?

- Do we need to build a separate VPC?

- Is there any documentation in place which talks about step by step setting up?

- What are the best practices around?

1 ACCEPTED SOLUTION

avatar
Master Mentor
11 REPLIES 11

avatar
Master Mentor

avatar
Master Mentor

@Smart Solutions I've used this step by step guide to setup HDP on Azure before, it is dated but still works http://blogs.technet.com/b/oliviaklose/archive/2014/07/02/hadoop-on-linux-on-azure-step-by-step-inst...

avatar
Expert Contributor

Yes, I've used it too. However my concern is : Should we keep all the nodes in a single cloud service ? or each node in separate cloud service? In the blog it's shown to be in a single service.

Thanks @Artem Ervits

avatar
Expert Contributor

Thanks @Artem Ervits for the security doc. However I still have other questions which are very specific to Azure. We already have HDP deployed on premises now want to extend it to Azure.

avatar
Master Mentor

here are some more considerations @Smart Solutions

all Vms in one Availability Set are in the same fault domain. That is, power disruptions and other things will affect all of them at once. For HDP, it means that every VM running the Master services must be place in its own Availability Set. If you have 5 Master nodes, put each of them in their own AS. Zookeeper is a master service, it should be spread across the 5 masters for quorum, and because they are in different AS, they will be guarded from abrupt shutdown disrupting the whole cluster.

avatar
Master Mentor

@Smart Solutions even more considerations

Use premium storage if possible: Azure premium storage are in essence SSD and provide great performance. Premium storage gives 5,000 IOPS per disk rather than the regular storage that gives just 500 IOPS per disk. VPN Tunnel: VPN tunneling allows dedicated network connection between on premise network to azure cloud network. This allows for integration with on-prem LDAP/AD. Need to ensure IP Addresses from Azure cluster doesn’t overlap with on-prem IPs Static IPs: When rebooting the servers need to sure that Azure VMs are pinned to the same IP Address again (and nor randomly reassigned). Make sure all the instances are in the same Azure Virtual Network. By default, VMs connected to the same VNET have full connectivity between them

avatar
Master Mentor

Azure cloud service: 1 Cloud Service means single IP Address interface for the entire cluster.Potential issue if you need to use the same port number for different services than you need to create multiple cloud services (e.g. Need to have Ambari and HUE both on port 8080 for the external IP) Mount points: All the mount point that has been added to master and compute nodes should not have “noexec" and the flags should be set to default. This is because if we set the mount point to “noexec” then YARN won’t be able to execute on those disks. Storage Accounts: Use one storage account for every nodes in the cluster in order to bypass IOPS limits for multiple VMs on the same Storage Account.

avatar
Master Mentor

avatar
Master Mentor

- Should we place each of the node in a separate cloud service or all nodes in a single cloud service?

No

- Should we setup HDP on all the nodes (18 nodes) and setup a Windows Server with Active directory for name resolution as show in the TechNet blog? Can we not have HDP cluster running without jumpbox?

This will work. HDP will be in all the nodes and you can have AD in one of the nodes

- Do we need to build a separate VPC?

Yes for security reasons.

- Is there any documentation in place which talks about step by step setting up?

Yes ..see that official blog

- What are the best practices around?