Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
Cloudera Employee

We have created a quickstart guide to walk you through step by step going from Google Cloud Platform (GCP) account to provisioning big data workloads on GCP using Hortonworks Cloudbreak. In this overview, we walk through the high level tasks to be performed and point you to the documentation with the step by step instructions.


There are 4 main steps involved to get started with GCP.

  1. Cloudbreak GCP Prerequisites: Setting up your GCP Environment to provision Cloudbreak and HDP/HDF workloads.
  2. Install Cloudbreak on GCP: Setting up Cloudbreak on GCP.
  3. Create Cloudbreak Credentials: Setting up your credentials in Cloudbreak to deploy HDP/HDF workloads on GCP.
  4. Creating a Cluster on GCP: Step by step walkthrough creating a big data workload on GCP.

Cloudbreak GCP Prerequisites

Before we install Cloudbreak we need to setup your working environment and GCP account for Cloudbreak. In this step, we will setup the following prerequisites.

  1. Install Google Cloud SDK: Setting up the Google Cloud SDK and gcloud CLI on your local machine necessary to install Cloudbreak using the GCP Quickstart Template.
  2. Enable GCP APIs: Turning on the Compute Engine API and Cloud Runtime Configuration API in your GCP account.
  3. Cloudbreak Service Account: Setting up a service account in GCP used by Cloudbreak to create VM instances on GCP for your HDP/HDF cluster nodes.

Prerequisites Step by Step Walkthrough

Install Cloudbreak on GCP

Cloudbreak has two installation methods:

  1. Cloudbreak Deployer: Bring your own VM and use the Cloudbreak deployer to manually install Cloudbreak. This is suitable for production environments and provides the most flexibility.
  2. Cloudbreak Quickstart Template: This is suitable for proof of concepts and development environments. In the quickstart guide, we provide detailed instructions on using our prebuilt GCP cloud deployment manager template to install Cloudbreak.

Installing Cloudbreak Step by Step Walkthrough

Create Cloudbreak Credentials

The Cloudbreak credential is setup in Cloudbreak using the previously created GCP service account as part of the prerequisites. This is the account used by Cloudbreak to instantiate VMs on your behalf and create the HDP/HDF cluster.

Cloudbreak Credentials Step by Step Walkthrough

Create a Cluster on GCP

Once you’ve setup your GCP credentials in Cloudbreak, you are now ready to create your first HDP/HDF cluster. Cloudbreak includes the following three ways to create a cluster.

  1. Basic Create Cluster Wizard: Fewest number of steps and information necessary to create a cluster on GCP. This is the process that we will walk through in the quickstart guide.
  2. Advanced Create Cluster Wizard: Cloudbreak can help you quickly deploy big data workloads on GCP but also provides a great deal of flexibility for customizations typical in an Enterprise environment. Custom OS images, shell scripts, LDAP/Kerberos authentication, and external databases for HDP components are a few examples that can be setup in Cloudbreak to use during the build process of your cluster.
  3. Command Line Interface: All of the functionality, including cluster provisioning, in the UI is available as part of the Cloudbreak CLI. This enables you to integrate Cloudbreak with other DevOps management tools.

Creating a Cluster Step by Step Walkthrough

What Next: Google Cloud Storage

After completing the quickstart guide and provisioning your first workload on GCP, the next step is leveraging Google Cloud Storage (GCS) with your big data workloads. Here are some additional resources for setting up GCS and attaching GCS buckets to HDP workloads using Cloudbreak.

Configuring Access to Google Cloud Storage