HDPSearch 4.0 was recently announced (Blog) which upgrades Solr from 6.6 to 7.4. The HPDSearch 4.0 Ambari management pack will install HDPSearch 3.0 for HDP 2.6 and HDPSearch 4.0 for HDP 3.0. HDP 3.0 is required for HDPSearch 4.0 because the HDFS and Hive libraries have been updated for Hadoop 3.1. Using Cloudbreak 2.8 Tech Preview (TP) you can install an HDP 3.0 cluster that includes HDPSearch 4.0 using Cloudbreak's management pack extensions.
Cloudbreak 2.8 is a Tech Preview release and is not suitable for production usage. Similarly CB 2.8 TP doesn't officially support deploying HDP 3.0 clusters. The intent is to become familiar with the process for when Cloudbreak 2.9 is released.
This tutorial is designed to walk you through the process of deploying an HDP 3.0 cluster which includes HDPSearch 4.0 components on AWS using a custom Ambri blueprint.
This tutorial was tested in the following environment:
We need to create a custom Ambari blueprint for an HDP 3.0 cluster. This tutorial provides a basic blueprint which has HDFS and YARN HA enabled.
Login to your Cloudbreak instance. In the left menu, click on Blueprints
. Cloudbreak will display a list of built-in and custom blueprints. Click on the CREATE BLUEPRINT
button. You should see something similar to the following:
If you have downloaded the blueprint JSON file, you can simply upload the file to create your new blueprint. Cloudbreak requires a unique name within the blueprint itself. If you wish to customize the blueprint name, you can edit the name in the editor window after uploading the blueprint. Enter a unique Name
and a meaningful Description
for the blueprint. These are displayed on the blueprint list screen. You can download the JSON blueprint file here: hdp301-ha-solr-blueprint.json
Click on the Upload JSON File
button and select the blueprint JSON file you downloaded. You should see something similar to this:
Scroll to the bottom and click on the CREATE
button. You should see the list of blueprints, including the newly created blueprint. You should see something similar to the following:
You can also choose to paste the JSON text by clicking on the Text
radio button.
Here is the text of the blueprint JSON:
{ "Blueprints": { "blueprint_name": "hdp301-ha-solr", "stack_name": "HDP", "stack_version": "3.0" }, "settings": [ { "recovery_settings": [] }, { "service_settings": [ { "name": "HIVE", "credential_store_enabled": "false" } ] }, { "component_settings": [] } ], "host_groups": [ { "name": "master_mgmt", "components": [ { "name": "METRICS_COLLECTOR" }, { "name": "METRICS_GRAFANA" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "INFRA_SOLR" }, { "name": "INFRA_SOLR_CLIENT" }, { "name": "METRICS_MONITOR" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "HIVE_CLIENT" }, { "name": "TEZ_CLIENT" }, { "name": "HIVE_METASTORE" }, { "name": "HIVE_SERVER" } ], "cardinality": "1" }, { "name": "master_nn1", "components": [ { "name": "NAMENODE" }, { "name": "ZKFC" }, { "name": "RESOURCEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "APP_TIMELINE_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "HIVE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "LIVY2_SERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "master_nn2", "components": [ { "name": "NAMENODE" }, { "name": "ZKFC" }, { "name": "RESOURCEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "HISTORYSERVER" }, { "name": "HIVE_SERVER" }, { "name": "PIG" }, { "name": "OOZIE_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "HIVE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "SPARK2_JOBHISTORYSERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "datanode", "components": [ { "name": "HIVE_CLIENT" }, { "name": "TEZ_CLIENT" }, { "name": "SPARK2_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "DATANODE" }, { "name": "METRICS_MONITOR" }, { "name": "NODEMANAGER" }, { "name": "SOLR_SERVER" } ], "cardinality": "1+" } ], "configurations": [ { "core-site": { "properties": { "fs.trash.interval": "4320", "fs.defaultFS": "hdfs://mycluster", "ha.zookeeper.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181", "hadoop.proxyuser.falcon.groups": "*", "hadoop.proxyuser.root.groups": "*", "hadoop.proxyuser.livy.hosts": "*", "hadoop.proxyuser.falcon.hosts": "*", "hadoop.proxyuser.oozie.hosts": "*", "hadoop.proxyuser.oozie.groups": "*", "hadoop.proxyuser.hive.groups": "*", "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.hbase.groups": "*", "hadoop.proxyuser.hbase.hosts": "*", "hadoop.proxyuser.root.hosts": "*", "hadoop.proxyuser.hive.hosts": "*", "hadoop.proxyuser.yarn.hosts": "*" } } }, { "hdfs-site": { "properties": { "dfs.namenode.safemode.threshold-pct": "0.99", "dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "dfs.ha.automatic-failover.enabled": "true", "dfs.ha.fencing.methods": "shell(/bin/true)", "dfs.ha.namenodes.mycluster": "nn1,nn2", "dfs.namenode.http-address": "%HOSTGROUP::master_nn1%:50070", "dfs.namenode.http-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50070", "dfs.namenode.http-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50070", "dfs.namenode.https-address": "%HOSTGROUP::master_nn1%:50470", "dfs.namenode.https-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50470", "dfs.namenode.https-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50470", "dfs.namenode.rpc-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:8020", "dfs.namenode.rpc-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:8020", "dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::master_nn1%:8485;%HOSTGROUP::master_nn2%:8485;%HOSTGROUP::master_mgmt%:8485/mycluster", "dfs.nameservices": "mycluster" } } }, { "hive-site": { "properties": { "hive.metastore.uris": "thrift://%HOSTGROUP::master_mgmt%:9083", "hive.exec.compress.output": "true", "hive.merge.mapfiles": "true", "hive.server2.tez.initialize.default.sessions": "true", "hive.server2.transport.mode": "http" } } }, { "mapred-site": { "properties": { "mapreduce.job.reduce.slowstart.completedmaps": "0.7", "mapreduce.map.output.compress": "true", "mapreduce.output.fileoutputformat.compress": "true" } } }, { "yarn-site": { "properties": { "hadoop.registry.rm.enabled": "true", "hadoop.registry.zk.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181", "yarn.log.server.url": "http://%HOSTGROUP::master_nn2%:19888/jobhistory/logs", "yarn.resourcemanager.address": "%HOSTGROUP::master_nn1%:8050", "yarn.resourcemanager.admin.address": "%HOSTGROUP::master_nn1%:8141", "yarn.resourcemanager.cluster-id": "yarn-cluster", "yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election", "yarn.resourcemanager.ha.enabled": "true", "yarn.resourcemanager.ha.rm-ids": "rm1,rm2", "yarn.resourcemanager.hostname": "%HOSTGROUP::master_nn1%", "yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::master_nn1%", "yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::master_nn2%", "yarn.resourcemanager.recovery.enabled": "true", "yarn.resourcemanager.resource-tracker.address": "%HOSTGROUP::master_nn1%:8025", "yarn.resourcemanager.scheduler.address": "%HOSTGROUP::master_nn1%:8030", "yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore", "yarn.resourcemanager.webapp.address": "%HOSTGROUP::master_nn1%:8088", "yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::master_nn1%:8088", "yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::master_nn2%:8088", "yarn.resourcemanager.webapp.https.address": "%HOSTGROUP::master_nn1%:8090", "yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::master_nn1%:8090", "yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::master_nn2%:8090", "yarn.timeline-service.address": "%HOSTGROUP::master_nn1%:10200", "yarn.timeline-service.webapp.address": "%HOSTGROUP::master_nn1%:8188", "yarn.timeline-service.webapp.https.address": "%HOSTGROUP::master_nn1%:8190" } } } ] }
HDPSearch is installed via an Ambari Management Pack. To automate the deployment of HDPSearch via a blueprint, you need to register the HDPSearch Management Pack with Cloudbreak.
In the left menu, click on Cluster Extensions
. This will expand expand to show Recipes
and Management Packs
. Click on Management Packs
. You should see something similar to the following:
Click on REGISTER MANAGEMENT PACK
. You should see something similar to the following:
Enter a unique Name
and meaningful Description
. The Management Pack URL
for the HDPSearch 4.0 Management Pack should be http://public-repo-1.hortonworks.com/HDP-SOLR/hdp-solr-ambari-mp/solr-service-mpack-4.0.0.tar.gz
.
Click Create
. You should see something similar to the following:
Now that we have a custom blueprint based on HDP 3.0 with a Solr component and we have registered the HDPSearch 4.0 Management Pack, we are ready to create a cluster.
In the left menu, click on Clusters
. Cloudbreak will display configured clusters. Click the CREATE CLUSTER
button. Cloudbreak will display the Create Cluster wizard.
By default, the General Configuration
screen is displayed using the BASIC
view. The ADVANCED
view gives you more control of AWS and cluster settings, to include features such as tags. You must use ADVANCED
view to attach a Management Pack to a cluster. You can change your view to ADVANCED
manually or you can change your Cloudbreak preferences to show ADVANCED
view by default. You should see something similar to the following:
Select Credential: Select the AWS credential you created. Most users will only have 1 credential per platform which will be selected automatically.
Cluster Name: Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens.
Region: Select the region in which you would like to launch your cluster.
Availability Zone: Select the availability zone in which you would like to launch your cluster.
Platform Version: Cloudbreak currently defaults to HDP 2.6. Select the dropdown arrow and select HDP 3.0
.
Cluster Type: Select the custom blueprint you recently created.
You should see something similar to the following:
Click the green Next
button.
Cloudbreak will display the Image Settings
screen. This where you can specify a custom Cloudbreak image or change the version of Ambari and HDP used in the cluster. You should see something similar to the following:
You do not need to change any settings on this page. Clck the green NEXT
button.
Cloudbreak will display the Hardware and Storage
screen. On this screen, you have the ability to change the instance types, attached storage and where the Ambari server will be installed. As you you can see, the blueprint calls for deploying at least 4 nodes. We will use the defaults.
Click the green Next
button.
Cloudbreak will display the Network and Availability
screen. On this screen, you have the ability to create a new VPC and Subnet or select from existing ones. The default is to create a new VPC and Subnet. We will use the defaults.
Click the green Next
button.
Cloudbreak will display the Cloud Storage
screen. On this screen, you have the ability to configure your cluster to have an instance profile allowing the cluster to access data on cloud storage. The default is to not configure cloud storage. We will use the defaults.
Click the green Next
button.
Cloudbreak will display the Cluster Extensions
screen. On this screen, you have the ability to associate recipes with differnet host groups and attach management packs to the cluser. You should see something similar to the following:
On this screen is where we associate the HDPSearch 4.0 management pack we registered previously. Select the dropdown under Available Management Packs
. Select the HDPSearch 4.0 management pack you registered. Then click the Install
button. You should see something similar to the following:
Click the green Next
button.
Cloudbreak will display the External Sources
screen. On this screen, you have the ability associate external sources like LDAP/AD and databases. You should see somethig similar to the following:
We will not be using this functionality with this cluster. Click the green Next
button.
Cloudbreak will display the Gateway Configuration
screen. On this screen, you have the ability to enable a protected gateway. This gateway uses Knox to provide a secure access point for the cluster. You should see somethig similar to the following:
We will use the defaults. Click the green Next
button.
Cloudbreak will display the Network Security Groups
screen. On this screen, you have the ability to specify the Network Security Groups
. You should see something similar to the following:
Cloudbreak defaults to creating new configurations. For production use cases, we highly recommend creating and refining your own definitions within the cloud platform. You can tell Cloudbreak to use those existing security groups by selecting the radio button. We need to add the Solr default port of 8983
to the host group where Solr will exist. This is the Data Node in the blueprint. I recommend that you specify "MyIP" to limit access to this port. You should see something similar to the following:
Click the green Next
button.
Cloudbreak will display the Security
screen. On this screen, you have the ability to specify the Ambari admin username and password. You can create a new SSH key or selecting an existing one. And finally, you have the ability to enable Kerberos on the cluster. We will use admin
for the username and BadPass#1
for the password. Select an existing SSH key from the drop down list. This should be a key you have already created and have access to the corresponding private key. We will NOT be enabling Kerberos, so make sure the Enable Kerberos Security
checkbox is not checked. You should see something similar to the following:
Click the green CREATE CLUSTER
button.
Cloudbreak will display the Cluster Summary
page. It will generally take between 10-15 minutes for the cluster to be fully deployed. Click on the cluster you just created. You should see something similar to the following:
Click on the Ambari URL
to open the Ambari UI.
You will likely see a browser warning when you first open the Ambari UI. That is because we are using self-signed certificates.
Click on the ADVANCED
button. Then click the link to Proceed
.
You will be presented with the Ambari login page. You will login using the username and password you specified when you created the cluster. That should have been admin
and BadPass#1
. Click the green Sign In
button.
You should see the cluster summary screen. As you can see, we have a cluster a cluster which includes on the Solr component.
Click on the Solr
service in the left hand menu. Now you can access the Quick Links
menu for a shortcut to the Solr UI.
You should see the Solr UI. As you can see, this is Solr 7.4
If you have successfully followed along with this tutorial, you should have created a custom HDP 3.0 blueprint which includes the Solr component, registered the HDPSearch 4.0 Management pack, and successfully deployed a cluster on AWS which included.