Community Articles

Find and share helpful community-sourced technical articles.
avatar


In this tutorial, I create a NiFi cluster from the default blueprint provided with Cloudbreak 2.5.0 TP, but this post has been updated to reflect what is available in Cloudbreak 2.9.0.

Update for Cloudbreak 2.9.0

This post was originally written for Cloudbreak 2.5.0 TP, which introduced support for creating HDF Flow Management (NiFi) clusters. It has been updated to reflect the latest features available in Cloudbreak 2.9.0 general availability release.

Cloudbreak 2.5.0 TP introduced support for creating HDF Flow Management (NiFi) clusters.The subsequent release, Cloudbreak 2.6.0 TP, introduced support for creating HDF Messaging Management (Kafka) clusters. Cloudbreak 2.7.0 GA introduced these as general availability features. In Cloudbreak 2.9.0 GA, two HDF 3.3 blueprints are included by default, one for Flow Management (NiFi) and one for Messaging Management (Kafka).

What is Cloudbreak?

Cloudbreak simplifies the provisioning, management, and monitoring of on-demand HDP and HDF clusters in virtual and cloud environments. It leverages cloud infrastructure to create host instances, and uses Apache Ambari via Ambari blueprints to provision and manage HDP and HDF clusters.

Cloudbreak allows you to create HDP and HDF clusters using the Cloudbreak web UI, Cloudbreak CLI, and Cloudbreak REST API. Clusters can be launched on public cloud infrastructure platforms Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform, and on the private cloud infrastructure platform OpenStack.

Support for creating NiFi clusters was introduced in Cloudbreak 2.5.0 Technical Preview and was GA'd in Cloudbreak 2.7.0.

Prerequisites

Cloud Provider Account

In order to use Cloudbreak, you must have access to a cloud provider account (AWS, Azure, Google Cloud, OpenStack) on which resources can be provisioned.

Launch Cloudbreak

If you part of Hortonworks, you have access to the hosted Cloudbreak instance. If you do not have access to this instance, you must launch Cloudbreak on your chosen cloud platform.

The instructions for launching Cloudbreak are available here:

Cloudbreak Deployment Options

Quickstart on AWS/Azure/GCP

Installing Cloudbreak on AWS/Azure/GCP/OpenStack

Create a Cloudbreak Credential

Once your Cloudbreak instance is running, you must create a Cloudbreak credential before you can start creating HDP or HDF clusters. Why do you need this? By creating a Cloudbreak credential, you provide Cloudbreak with the means to authenticate with your cloud provider account and provision resources (virtual networks, VMs, and so on) on that account. Creating a Cloudbreak credential is always required, even if Cloudbreak instance is running on your cloud provider account.

The instructions for creating a Cloudbreak credential are available here:

Creating a Cloudbreak Credential on AWS/Azure/GCP/OpenStack

> Tip: When using a corporate cloud provider account, you are unlikely to be able to perform all the required prerequisite steps by yourself and you may therefore need to contact your IT so that they can perform some of the steps for you. For related tips, refer to this HCC post.

Create a Flow Management Cluster

Creating clusters is possible from Cloudbreak web UI and Cloudbreak CLI. It’s best to get started with the UI before attempting to use the CLI.

1.Log in to the Cloudbreak UI.

2.Click Create Cluster and the Create Cluster wizard is displayed. By default, Basic view is displayed. You can click Advanced to see more options, but in this post we are just addressing basic parameters.

3.On the General Configuration page, specify the following general parameters for your cluster:

  • Select Credential: Select the credential that you created as a prerequisite. After you have selected the credential, the wizard options will be adjusted for the cloud platform that the credential can be used for and parameters such as regions will be populated. As you can see in my screenshot, I selected an AWS-specific credential and my cluster will be created on AWS.
  • Cluster Name: Enter some name for your cluster, for example, “nifi-test”.
  • Region: Select the region that you would like to use (typically your local region). On some cloud providers, you can also select an availability zone within a region.
  • Platform Version: Select “HDF 3.3” (Sorry for the old screenshot showing 3.2).
  • Cluster Type: Select “Flow Management: Apache NiFi”. This is a default blueprint which includes Apache NiFi.

4.When done, click Next.

5.On the Hardware and Storage page, Cloudbreak pre-populates recommended instance types/count and storage type/size. You may adjust these depending on how many nodes and storage you want and of what type. By default, a 2-node cluster will be created with one node in each host group (Services host group and NiFi host group).

6.Before proceeding to the next page, you must select the host group on which Ambari Server will be install. Under “Services” host group, check “Ambari Server” so that Ambari Server is installed on that host group:

7.When done, click Next.

8.On the Network and Availability page, you can proceed with the default settings or adjust the settings in the following way:

  • Select Network: If you do not make a selection, a new network will be created on the cloud provider. If you already have a network that you would like to use, you can select it here; otherwise you can keep the default, just note - if you are using a shared account - that there are limits to how many virtual networks can be created per region within a given account.
  • Select Subnet: If you do not make a selection, a new subnet will be created on the cloud provider within the network selected under “Select Network”.
  • Subnet (CIDR): By default, 10.0.0.0/16 is used.

9. On the Gateway Configuration page, do not change anything. Just click Next.

10. On the Network Security Groups page,

  • On NiFi host group, specify a TCP rule to open port 9091 to your public IP and click + to add it.
  • On the Services host group, specify a TCP rule to open 61443. This port is used by NiFi Registry.

Your configuration should look like this:


> Tip: If you are planning to use NiFi processors that require additional ports, add additional rules to open these ports.

> Tip: By default, Cloudbreak creates a new security group for each host group. By default, ports 9443, 22, and 443 are open to all (0.0.0.0/0) on the Services host group (because this is where Ambari Server is installed); and port 22 is open to all (0.0.0.0/0) on the NiFi host group. These settings are not suitable for production. If you are planning to leave your cluster running for longer than a few hours, review the guidelines documented here and limit the access by (1) deleting the default rules and (2) adding new rules by setting the CIDR to “My IP” and “Custom” (use “Custom” for specifying the Cloudbreak instance IP).

9.On the Security page, provide the following information:

  • Cluster User: Enter the name for the cluster user. Default is “admin”. You will use this to log in to Ambari web UI.
  • Password and Confirm Password: Enter the password for the cluster user. You will use this to log in to Ambari web UI.
  • SSH public key: Paste your existing SSH public key or select an SSH public key that you have previously uploaded to your cloud provider account. Later, in order to access the cluster nodes via SSH, you will have to provide the matching private key.
  • Enable Kerberos Security: If you are just getting started, select “Use Test KDC” to have a new test KDC provisioned for the cluster.

> Warning: Make sure not to disable Kerberos. If you don’t have one, select to create a test KDC. If you use the default Flow Management blueprint without enabling Kerberos, the NiFi UI will be inaccessible unless you configure an SSL certificate OR you register and use an existing LDAP.

10.At this point, you have provided all parameters required to create your cluster. Click CREATE CLUSTER to start cluster creation process.

11.You will be redirected to the cluster dashboard and the cluster status presented on the corresponding tile will be “Create in progress” (blue color). When the cluster is ready, its status will change to “Running”:

Access and Manage Clusters

Once the status of your cluster changes to “Running”, click on the cluster tile to view cluster details where you can find information related to your cluster and access cluster-related options.

Note the following options:

1.Click on the link under Ambari URL to access Ambari web UI in the browser:

2.Log in to the Ambari web UI by using the cluster user and password created when creating a cluster. Since Ambari web UI is set up with a self-signed SSL certificate, the first time you access it your browser will warn you about an untrusted connection and will ask you to confirm a security exception. Once you have logged in, you can access NiFi service from the Ambari dashboard:

Nifi UI link is available from Quick Links:

3.To access cluster nodes via SSH, use:

  • The “cloudbreak” user
  • The private key corresponding to the public key that you provided/selected when creating a cluster
  • Obtain the VM public IP address from the Hardware pane in the cluster details:

For example, on Max OS X:

ssh -i "mytest-kp.pem" cloudbreak@52.25.169.132

4.Cloudbreak web UI provides the options to Stop/Start, and Sync the cluster with the cloud provider. Once you don’t need the cluster, you can terminate it by using the Terminate option available in the cluster details.

> Resizing and autoscaling: In general, downscaling NiFi clusters is not supported - as it can result in data loss when a node is removed that has not yet processed all the data on that node. Upscaling is supported, but there is a known issue which requires you to manually update the newly added hosts (see Known Issues).

5.The Show CLI command option allows you to generate a JSON template for the existing cluster; the template can later be used to automate cluster creation with Cloudbreak CLI.

6.You can download Cloudbreak CLI by selecting Download CLI from the navigation pane. The CLI is available for Mac OS X/Windows/Linux.

7.You only need to configure the CLI once so that it can be used with your Cloudbreak instance:

./cb configure --server<cloudbreak IP> --username <cloudbreak-user> --password <cloudbreak-password>

8.Once it has been configured, you can view available commands by using:

./cb

Advanced Cluster Options

Cloudbreak includes additional advanced options, some of which are cloud platform-specific. To review them, refer to the following docs:

Creating a Cluster on AWS/Azure/GCP/OpenStack

Learn More

Cloudbreak 2.9.0 docs

Creating HDF clusters

8,568 Views
Comments

Updated for Cloudbreak 2.9.0.