Community Articles

Find and share helpful community-sourced technical articles.
avatar
Super Guru

Objectives

HDPSearch 4.0 was recently announced (Blog) which upgrades Solr from 6.6 to 7.4. The HPDSearch 4.0 Ambari management pack will install HDPSearch 3.0 for HDP 2.6 and HDPSearch 4.0 for HDP 3.0. HDP 3.0 is required for HDPSearch 4.0 because the HDFS and Hive libraries have been updated for Hadoop 3.1. Using Cloudbreak 2.8 Tech Preview (TP) you can install an HDP 3.0 cluster that includes HDPSearch 4.0 using Cloudbreak's management pack extensions.

Cloudbreak 2.8 is a Tech Preview release and is not suitable for production usage. Similarly CB 2.8 TP doesn't officially support deploying HDP 3.0 clusters. The intent is to become familiar with the process for when Cloudbreak 2.9 is released.

This tutorial is designed to walk you through the process of deploying an HDP 3.0 cluster which includes HDPSearch 4.0 components on AWS using a custom Ambri blueprint.

Prerequisites

  • You should already have an installed version of Cloudbreak 2.8.
  • You can find an article that walks you through installing a local version of Cloudbreak with Vagrant and Virtualbox here: HCC Article
  • You should have an AWS account with appropriate permissions.
  • You should already have created your AWS credential in Cloudbreak.
  • You should be familiar with HDPSearch.

Scope

This tutorial was tested in the following environment:

  • Cloudbreak 2.8.0
  • HDPSearch 4.0
  • AWS (also works on Azure and Google)

Steps

1. Create New HDP Blueprint

We need to create a custom Ambari blueprint for an HDP 3.0 cluster. This tutorial provides a basic blueprint which has HDFS and YARN HA enabled.

Login to your Cloudbreak instance. In the left menu, click on Blueprints. Cloudbreak will display a list of built-in and custom blueprints. Click on the CREATE BLUEPRINT button. You should see something similar to the following:

93473-cb-create-blueprint.png

If you have downloaded the blueprint JSON file, you can simply upload the file to create your new blueprint. Cloudbreak requires a unique name within the blueprint itself. If you wish to customize the blueprint name, you can edit the name in the editor window after uploading the blueprint. Enter a unique Name and a meaningful Description for the blueprint. These are displayed on the blueprint list screen. You can download the JSON blueprint file here: hdp301-ha-solr-blueprint.json

Click on the Upload JSON File button and select the blueprint JSON file you downloaded. You should see something similar to this:

93474-cb-blueprint-json-upload.png

Scroll to the bottom and click on the CREATE button. You should see the list of blueprints, including the newly created blueprint. You should see something similar to the following:

93475-cb-blueprint-list.png

You can also choose to paste the JSON text by clicking on the Text radio button.

Here is the text of the blueprint JSON:

{
  "Blueprints": {
    "blueprint_name": "hdp301-ha-solr",
    "stack_name": "HDP",
    "stack_version": "3.0"
  },
  "settings": [
    {
      "recovery_settings": []
    },
    {
      "service_settings": [
        {
          "name": "HIVE",
          "credential_store_enabled": "false"
        }
      ]
    },
    {
      "component_settings": []
    }
  ],
  "host_groups": [
    {
      "name": "master_mgmt",
      "components": [
        {
          "name": "METRICS_COLLECTOR"
        },
        {
          "name": "METRICS_GRAFANA"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        },
        {
          "name": "JOURNALNODE"
        },
        {
          "name": "INFRA_SOLR"
        },
        {
          "name": "INFRA_SOLR_CLIENT"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "OOZIE_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        },
        {
          "name": "HIVE_METASTORE"
        },
        {
          "name": "HIVE_SERVER"
        }
      ],
      "cardinality": "1"
    },
    {
      "name": "master_nn1",
      "components": [
        {
          "name": "NAMENODE"
        },
        {
          "name": "ZKFC"
        },
        {
          "name": "RESOURCEMANAGER"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "APP_TIMELINE_SERVER"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        },
        {
          "name": "JOURNALNODE"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "OOZIE_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "LIVY2_SERVER"
        },
        {
          "name": "SPARK2_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        }
      ],
      "cardinality": "1"
    },
    {
      "name": "master_nn2",
      "components": [
        {
          "name": "NAMENODE"
        },
        {
          "name": "ZKFC"
        },
        {
          "name": "RESOURCEMANAGER"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "HISTORYSERVER"
        },
        {
          "name": "HIVE_SERVER"
        },
        {
          "name": "PIG"
        },
        {
          "name": "OOZIE_SERVER"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        },
        {
          "name": "JOURNALNODE"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "OOZIE_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "SPARK2_JOBHISTORYSERVER"
        },
        {
          "name": "SPARK2_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        }
      ],
      "cardinality": "1"
    },
    {
      "name": "datanode",
      "components": [
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        },
        {
          "name": "SPARK2_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "OOZIE_CLIENT"
        },
        {
          "name": "DATANODE"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "NODEMANAGER"
        },
        {
          "name": "SOLR_SERVER"
        }
      ],
      "cardinality": "1+"
    }
  ],
  "configurations": [
    {
      "core-site": {
        "properties": {
          "fs.trash.interval": "4320",
          "fs.defaultFS": "hdfs://mycluster",
          "ha.zookeeper.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181",
          "hadoop.proxyuser.falcon.groups": "*",
          "hadoop.proxyuser.root.groups": "*",
          "hadoop.proxyuser.livy.hosts": "*",
          "hadoop.proxyuser.falcon.hosts": "*",
          "hadoop.proxyuser.oozie.hosts": "*",
          "hadoop.proxyuser.oozie.groups": "*",
          "hadoop.proxyuser.hive.groups": "*",
          "hadoop.proxyuser.livy.groups": "*",
          "hadoop.proxyuser.hbase.groups": "*",
          "hadoop.proxyuser.hbase.hosts": "*",
          "hadoop.proxyuser.root.hosts": "*",
          "hadoop.proxyuser.hive.hosts": "*",
          "hadoop.proxyuser.yarn.hosts": "*"
        }
      }
    },
    {
      "hdfs-site": {
        "properties": {
          "dfs.namenode.safemode.threshold-pct": "0.99",
          "dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
          "dfs.ha.automatic-failover.enabled": "true",
          "dfs.ha.fencing.methods": "shell(/bin/true)",
          "dfs.ha.namenodes.mycluster": "nn1,nn2",
          "dfs.namenode.http-address": "%HOSTGROUP::master_nn1%:50070",
          "dfs.namenode.http-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50070",
          "dfs.namenode.http-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50070",
          "dfs.namenode.https-address": "%HOSTGROUP::master_nn1%:50470",
          "dfs.namenode.https-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50470",
          "dfs.namenode.https-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50470",
          "dfs.namenode.rpc-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:8020",
          "dfs.namenode.rpc-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:8020",
          "dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::master_nn1%:8485;%HOSTGROUP::master_nn2%:8485;%HOSTGROUP::master_mgmt%:8485/mycluster",
          "dfs.nameservices": "mycluster"
        }
      }
    },
    {
      "hive-site": {
        "properties": {
          "hive.metastore.uris": "thrift://%HOSTGROUP::master_mgmt%:9083",
          "hive.exec.compress.output": "true",
          "hive.merge.mapfiles": "true",
          "hive.server2.tez.initialize.default.sessions": "true",
          "hive.server2.transport.mode": "http"
        }
      }
    },
    {
      "mapred-site": {
        "properties": {
          "mapreduce.job.reduce.slowstart.completedmaps": "0.7",
          "mapreduce.map.output.compress": "true",
          "mapreduce.output.fileoutputformat.compress": "true"
        }
      }
    },
    {
      "yarn-site": {
        "properties": {
          "hadoop.registry.rm.enabled": "true",
          "hadoop.registry.zk.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181",
          "yarn.log.server.url": "http://%HOSTGROUP::master_nn2%:19888/jobhistory/logs",
          "yarn.resourcemanager.address": "%HOSTGROUP::master_nn1%:8050",
          "yarn.resourcemanager.admin.address": "%HOSTGROUP::master_nn1%:8141",
          "yarn.resourcemanager.cluster-id": "yarn-cluster",
          "yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election",
          "yarn.resourcemanager.ha.enabled": "true",
          "yarn.resourcemanager.ha.rm-ids": "rm1,rm2",
          "yarn.resourcemanager.hostname": "%HOSTGROUP::master_nn1%",
          "yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::master_nn1%",
          "yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::master_nn2%",
          "yarn.resourcemanager.recovery.enabled": "true",
          "yarn.resourcemanager.resource-tracker.address": "%HOSTGROUP::master_nn1%:8025",
          "yarn.resourcemanager.scheduler.address": "%HOSTGROUP::master_nn1%:8030",
          "yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore",
          "yarn.resourcemanager.webapp.address": "%HOSTGROUP::master_nn1%:8088",
          "yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::master_nn1%:8088",
          "yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::master_nn2%:8088",
          "yarn.resourcemanager.webapp.https.address": "%HOSTGROUP::master_nn1%:8090",
          "yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::master_nn1%:8090",
          "yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::master_nn2%:8090",
          "yarn.timeline-service.address": "%HOSTGROUP::master_nn1%:10200",
          "yarn.timeline-service.webapp.address": "%HOSTGROUP::master_nn1%:8188",
          "yarn.timeline-service.webapp.https.address": "%HOSTGROUP::master_nn1%:8190"
        }
      }
    }
  ]
}

2. Register Management Pack

HDPSearch is installed via an Ambari Management Pack. To automate the deployment of HDPSearch via a blueprint, you need to register the HDPSearch Management Pack with Cloudbreak.

In the left menu, click on Cluster Extensions. This will expand expand to show Recipes and Management Packs. Click on Management Packs. You should see something similar to the following:

93476-cb-management-pack-list.png

Click on REGISTER MANAGEMENT PACK. You should see something similar to the following:

93477-cb-create-management-pack.png

Enter a unique Name and meaningful Description. The Management Pack URL for the HDPSearch 4.0 Management Pack should be http://public-repo-1.hortonworks.com/HDP-SOLR/hdp-solr-ambari-mp/solr-service-mpack-4.0.0.tar.gz.

Click Create. You should see something similar to the following:

93478-cb-management-pack-list-2.png

3. Create Cluster

Now that we have a custom blueprint based on HDP 3.0 with a Solr component and we have registered the HDPSearch 4.0 Management Pack, we are ready to create a cluster.

In the left menu, click on Clusters. Cloudbreak will display configured clusters. Click the CREATE CLUSTER button. Cloudbreak will display the Create Cluster wizard.

a. General Configuration

By default, the General Configuration screen is displayed using the BASIC view. The ADVANCED view gives you more control of AWS and cluster settings, to include features such as tags. You must use ADVANCED view to attach a Management Pack to a cluster. You can change your view to ADVANCED manually or you can change your Cloudbreak preferences to show ADVANCED view by default. You should see something similar to the following:

93479-cb-general-configuration.png

  • Select Credential: Select the AWS credential you created. Most users will only have 1 credential per platform which will be selected automatically.

  • Cluster Name: Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens.

  • Region: Select the region in which you would like to launch your cluster.

  • Availability Zone: Select the availability zone in which you would like to launch your cluster.

  • Platform Version: Cloudbreak currently defaults to HDP 2.6. Select the dropdown arrow and select HDP 3.0.

  • Cluster Type: Select the custom blueprint you recently created.

You should see something similar to the following:

93480-cb-general-configuration-2.png

Click the green Next button.

c. Image Settings

Cloudbreak will display the Image Settings screen. This where you can specify a custom Cloudbreak image or change the version of Ambari and HDP used in the cluster. You should see something similar to the following:

93481-cb-image-settings.png

You do not need to change any settings on this page. Clck the green NEXT button.

d. Hardware and Storage

Cloudbreak will display the Hardware and Storage screen. On this screen, you have the ability to change the instance types, attached storage and where the Ambari server will be installed. As you you can see, the blueprint calls for deploying at least 4 nodes. We will use the defaults.

93482-cb-hardware-storage.png

Click the green Next button.

e. Network and Availability

Cloudbreak will display the Network and Availability screen. On this screen, you have the ability to create a new VPC and Subnet or select from existing ones. The default is to create a new VPC and Subnet. We will use the defaults.

93483-cb-network-availability.png

Click the green Next button.

f. Cloud Storage

Cloudbreak will display the Cloud Storage screen. On this screen, you have the ability to configure your cluster to have an instance profile allowing the cluster to access data on cloud storage. The default is to not configure cloud storage. We will use the defaults.

93484-cb-cloud-storage.png

Click the green Next button.

g. Cluster Extensions

Cloudbreak will display the Cluster Extensions screen. On this screen, you have the ability to associate recipes with differnet host groups and attach management packs to the cluser. You should see something similar to the following:

93485-cb-cluster-extensions.png

On this screen is where we associate the HDPSearch 4.0 management pack we registered previously. Select the dropdown under Available Management Packs. Select the HDPSearch 4.0 management pack you registered. Then click the Install button. You should see something similar to the following:

93486-cb-cluster-extensions-2.png

Click the green Next button.

h. External Sources

Cloudbreak will display the External Sources screen. On this screen, you have the ability associate external sources like LDAP/AD and databases. You should see somethig similar to the following:

93487-cb-external-sources.png

We will not be using this functionality with this cluster. Click the green Next button.

i. Gateway Configuration

Cloudbreak will display the Gateway Configuration screen. On this screen, you have the ability to enable a protected gateway. This gateway uses Knox to provide a secure access point for the cluster. You should see somethig similar to the following:

93488-cb-gateway-configuration.png

We will use the defaults. Click the green Next button.

j. Network Security Groups

Cloudbreak will display the Network Security Groups screen. On this screen, you have the ability to specify the Network Security Groups. You should see something similar to the following:

93489-cb-network-security-groups.png

Cloudbreak defaults to creating new configurations. For production use cases, we highly recommend creating and refining your own definitions within the cloud platform. You can tell Cloudbreak to use those existing security groups by selecting the radio button. We need to add the Solr default port of 8983 to the host group where Solr will exist. This is the Data Node in the blueprint. I recommend that you specify "MyIP" to limit access to this port. You should see something similar to the following:

93490-cb-network-security-groups-2.png

Click the green Next button.

f. Security

Cloudbreak will display the Security screen. On this screen, you have the ability to specify the Ambari admin username and password. You can create a new SSH key or selecting an existing one. And finally, you have the ability to enable Kerberos on the cluster. We will use admin for the username and BadPass#1 for the password. Select an existing SSH key from the drop down list. This should be a key you have already created and have access to the corresponding private key. We will NOT be enabling Kerberos, so make sure the Enable Kerberos Security checkbox is not checked. You should see something similar to the following:

93491-cb-security.png

Click the green CREATE CLUSTER button.

g. Cluster Summary

Cloudbreak will display the Cluster Summary page. It will generally take between 10-15 minutes for the cluster to be fully deployed. Click on the cluster you just created. You should see something similar to the following:

93492-cb-cluster-summary.png

Click on the Ambari URL to open the Ambari UI.

h. Ambari

You will likely see a browser warning when you first open the Ambari UI. That is because we are using self-signed certificates.

93493-cb-ambari-warning.png

Click on the ADVANCED button. Then click the link to Proceed.

93494-cb-ambari-warning-2.png

You will be presented with the Ambari login page. You will login using the username and password you specified when you created the cluster. That should have been admin and BadPass#1. Click the green Sign In button.

93495-cb-ambari-login.png

You should see the cluster summary screen. As you can see, we have a cluster a cluster which includes on the Solr component.

93496-cb-ambari-summary.png

Click on the Solr service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the Solr UI.

You should see the Solr UI. As you can see, this is Solr 7.4

93497-cb-solr-summary.png

Review

If you have successfully followed along with this tutorial, you should have created a custom HDP 3.0 blueprint which includes the Solr component, registered the HDPSearch 4.0 Management pack, and successfully deployed a cluster on AWS which included.

2,267 Views