Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDF on Azure - guidance

Solved Go to solution

HDF on Azure - guidance

Contributor

Looking for documentation on installing HDF on Azure. I see that there is no marketplace template and it will be a pure IaaS setup. This is for a PoC. Plan is to set up a 3 node NiFi-only cluster (no Kafka/Storm etc), with one management node for security/operations, leveraging Ambari to install NiFi.

Looking for guidance specifically on these areas-

  1. OS image to use on Azure
  2. Any OS level tuning/configuration that needs to be done
  3. Anything networking related besides Azure vnet
  4. Recommended foundational software with version – e.g. Java version and anything else
  5. Minimum config - VM SKU, disk SKU, for operations and security node, and disk partitioning
  6. Minimum config - VM SKU, disk SKU, disk partitioning for NiFi nodes
  7. Any best practices
  8. Detailed documentation

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HDF on Azure - guidance

@Anagha Khanolkar

There is no preferred OS for HDF, use the one that you have the most knowledge. I will say that LINUX is the most tested OS used.

There is documentation that cover OS specific tuning and best practices. The only required software is JAVA 8.

The minimum system resources will be driven by the volume of data, size of files and how much processing will be done on the data.

Here is a link to documentation that provides a good starting point and hardware sizing recommendations : Planning your deployment

View solution in original post

1 REPLY 1
Highlighted

Re: HDF on Azure - guidance

@Anagha Khanolkar

There is no preferred OS for HDF, use the one that you have the most knowledge. I will say that LINUX is the most tested OS used.

There is documentation that cover OS specific tuning and best practices. The only required software is JAVA 8.

The minimum system resources will be driven by the volume of data, size of files and how much processing will be done on the data.

Here is a link to documentation that provides a good starting point and hardware sizing recommendations : Planning your deployment

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here