About myoung

bercyn1291 · ‎03-30-2020

Can I use pyhive to connect to Hive using Hive JDBC string instead of a single hostname? The following doesn't work for me. from pyhive import hive hive_conn = hive.Connection(host=<JDBC STRING>, configuration {'serviceDiscoveryMode':'zooKeeper','zooKeeperNamespace':'hiveserver2'})

Kad · ‎11-12-2019

We are stuck with the same issue. We are unable to receive the HL7 messages. Could you guide us with the custom processor creation? Any help in this regard will be appreciated. Thank you.

vivarsh · ‎09-18-2019

@myoung can you please give the syntax to write this sort of query

myoung · ‎10-30-2018

Objectives HDPSearch 4.0 was recently announced (Blog) which upgrades Solr from 6.6 to 7.4. The HPDSearch 4.0 Ambari management pack will install HDPSearch 3.0 for HDP 2.6 and HDPSearch 4.0 for HDP 3.0. HDP 3.0 is required for HDPSearch 4.0 because the HDFS and Hive libraries have been updated for Hadoop 3.1. Using Cloudbreak 2.8 Tech Preview (TP) you can install an HDP 3.0 cluster that includes HDPSearch 4.0 using Cloudbreak's management pack extensions. Cloudbreak 2.8 is a Tech Preview release and is not suitable for production usage. Similarly CB 2.8 TP doesn't officially support deploying HDP 3.0 clusters. The intent is to become familiar with the process for when Cloudbreak 2.9 is released. This tutorial is designed to walk you through the process of deploying an HDP 3.0 cluster which includes HDPSearch 4.0 components on AWS using a custom Ambri blueprint. Prerequisites You should already have an installed version of Cloudbreak 2.8. You can find the documentation on Cloudbreak here: Cloudbreak Documentation You can find an article that walks you through installing a local version of Cloudbreak with Vagrant and Virtualbox here: HCC Article You should have an AWS account with appropriate permissions. You can read more about AWS permissions here: Cloudbreak Documentation You should already have created your AWS credential in Cloudbreak. You should be familiar with HDPSearch. You can find the documentation on HDPSearch here:HDPSearch 4.0 Documentation Scope This tutorial was tested in the following environment: Cloudbreak 2.8.0 HDPSearch 4.0 AWS (also works on Azure and Google) Steps 1. Create New HDP Blueprint We need to create a custom Ambari blueprint for an HDP 3.0 cluster. This tutorial provides a basic blueprint which has HDFS and YARN HA enabled. Login to your Cloudbreak instance. In the left menu, click on Blueprints . Cloudbreak will display a list of built-in and custom blueprints. Click on the CREATE BLUEPRINT button. You should see something similar to the following: If you have downloaded the blueprint JSON file, you can simply upload the file to create your new blueprint. Cloudbreak requires a unique name within the blueprint itself. If you wish to customize the blueprint name, you can edit the name in the editor window after uploading the blueprint. Enter a unique Name and a meaningful Description for the blueprint. These are displayed on the blueprint list screen. You can download the JSON blueprint file here: hdp301-ha-solr-blueprint.json Click on the Upload JSON File button and select the blueprint JSON file you downloaded. You should see something similar to this: Scroll to the bottom and click on the CREATE button. You should see the list of blueprints, including the newly created blueprint. You should see something similar to the following: You can also choose to paste the JSON text by clicking on the Text radio button. Here is the text of the blueprint JSON: { "Blueprints": { "blueprint_name": "hdp301-ha-solr", "stack_name": "HDP", "stack_version": "3.0" }, "settings": [ { "recovery_settings": [] }, { "service_settings": [ { "name": "HIVE", "credential_store_enabled": "false" } ] }, { "component_settings": [] } ], "host_groups": [ { "name": "master_mgmt", "components": [ { "name": "METRICS_COLLECTOR" }, { "name": "METRICS_GRAFANA" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "INFRA_SOLR" }, { "name": "INFRA_SOLR_CLIENT" }, { "name": "METRICS_MONITOR" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "HIVE_CLIENT" }, { "name": "TEZ_CLIENT" }, { "name": "HIVE_METASTORE" }, { "name": "HIVE_SERVER" } ], "cardinality": "1" }, { "name": "master_nn1", "components": [ { "name": "NAMENODE" }, { "name": "ZKFC" }, { "name": "RESOURCEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "APP_TIMELINE_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "HIVE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "LIVY2_SERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "master_nn2", "components": [ { "name": "NAMENODE" }, { "name": "ZKFC" }, { "name": "RESOURCEMANAGER" }, { "name": "METRICS_MONITOR" }, { "name": "HISTORYSERVER" }, { "name": "HIVE_SERVER" }, { "name": "PIG" }, { "name": "OOZIE_SERVER" }, { "name": "ZOOKEEPER_SERVER" }, { "name": "JOURNALNODE" }, { "name": "HIVE_CLIENT" }, { "name": "HDFS_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "ZOOKEEPER_CLIENT" }, { "name": "SPARK2_JOBHISTORYSERVER" }, { "name": "SPARK2_CLIENT" }, { "name": "MAPREDUCE2_CLIENT" }, { "name": "TEZ_CLIENT" } ], "cardinality": "1" }, { "name": "datanode", "components": [ { "name": "HIVE_CLIENT" }, { "name": "TEZ_CLIENT" }, { "name": "SPARK2_CLIENT" }, { "name": "YARN_CLIENT" }, { "name": "OOZIE_CLIENT" }, { "name": "DATANODE" }, { "name": "METRICS_MONITOR" }, { "name": "NODEMANAGER" }, { "name": "SOLR_SERVER" } ], "cardinality": "1+" } ], "configurations": [ { "core-site": { "properties": { "fs.trash.interval": "4320", "fs.defaultFS": "hdfs://mycluster", "ha.zookeeper.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181", "hadoop.proxyuser.falcon.groups": "*", "hadoop.proxyuser.root.groups": "*", "hadoop.proxyuser.livy.hosts": "*", "hadoop.proxyuser.falcon.hosts": "*", "hadoop.proxyuser.oozie.hosts": "*", "hadoop.proxyuser.oozie.groups": "*", "hadoop.proxyuser.hive.groups": "*", "hadoop.proxyuser.livy.groups": "*", "hadoop.proxyuser.hbase.groups": "*", "hadoop.proxyuser.hbase.hosts": "*", "hadoop.proxyuser.root.hosts": "*", "hadoop.proxyuser.hive.hosts": "*", "hadoop.proxyuser.yarn.hosts": "*" } } }, { "hdfs-site": { "properties": { "dfs.namenode.safemode.threshold-pct": "0.99", "dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "dfs.ha.automatic-failover.enabled": "true", "dfs.ha.fencing.methods": "shell(/bin/true)", "dfs.ha.namenodes.mycluster": "nn1,nn2", "dfs.namenode.http-address": "%HOSTGROUP::master_nn1%:50070", "dfs.namenode.http-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50070", "dfs.namenode.http-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50070", "dfs.namenode.https-address": "%HOSTGROUP::master_nn1%:50470", "dfs.namenode.https-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:50470", "dfs.namenode.https-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:50470", "dfs.namenode.rpc-address.mycluster.nn1": "%HOSTGROUP::master_nn1%:8020", "dfs.namenode.rpc-address.mycluster.nn2": "%HOSTGROUP::master_nn2%:8020", "dfs.namenode.shared.edits.dir": "qjournal://%HOSTGROUP::master_nn1%:8485;%HOSTGROUP::master_nn2%:8485;%HOSTGROUP::master_mgmt%:8485/mycluster", "dfs.nameservices": "mycluster" } } }, { "hive-site": { "properties": { "hive.metastore.uris": "thrift://%HOSTGROUP::master_mgmt%:9083", "hive.exec.compress.output": "true", "hive.merge.mapfiles": "true", "hive.server2.tez.initialize.default.sessions": "true", "hive.server2.transport.mode": "http" } } }, { "mapred-site": { "properties": { "mapreduce.job.reduce.slowstart.completedmaps": "0.7", "mapreduce.map.output.compress": "true", "mapreduce.output.fileoutputformat.compress": "true" } } }, { "yarn-site": { "properties": { "hadoop.registry.rm.enabled": "true", "hadoop.registry.zk.quorum": "%HOSTGROUP::master_nn1%:2181,%HOSTGROUP::master_nn2%:2181,%HOSTGROUP::master_mgmt%:2181", "yarn.log.server.url": "http://%HOSTGROUP::master_nn2%:19888/jobhistory/logs", "yarn.resourcemanager.address": "%HOSTGROUP::master_nn1%:8050", "yarn.resourcemanager.admin.address": "%HOSTGROUP::master_nn1%:8141", "yarn.resourcemanager.cluster-id": "yarn-cluster", "yarn.resourcemanager.ha.automatic-failover.zk-base-path": "/yarn-leader-election", "yarn.resourcemanager.ha.enabled": "true", "yarn.resourcemanager.ha.rm-ids": "rm1,rm2", "yarn.resourcemanager.hostname": "%HOSTGROUP::master_nn1%", "yarn.resourcemanager.hostname.rm1": "%HOSTGROUP::master_nn1%", "yarn.resourcemanager.hostname.rm2": "%HOSTGROUP::master_nn2%", "yarn.resourcemanager.recovery.enabled": "true", "yarn.resourcemanager.resource-tracker.address": "%HOSTGROUP::master_nn1%:8025", "yarn.resourcemanager.scheduler.address": "%HOSTGROUP::master_nn1%:8030", "yarn.resourcemanager.store.class": "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore", "yarn.resourcemanager.webapp.address": "%HOSTGROUP::master_nn1%:8088", "yarn.resourcemanager.webapp.address.rm1": "%HOSTGROUP::master_nn1%:8088", "yarn.resourcemanager.webapp.address.rm2": "%HOSTGROUP::master_nn2%:8088", "yarn.resourcemanager.webapp.https.address": "%HOSTGROUP::master_nn1%:8090", "yarn.resourcemanager.webapp.https.address.rm1": "%HOSTGROUP::master_nn1%:8090", "yarn.resourcemanager.webapp.https.address.rm2": "%HOSTGROUP::master_nn2%:8090", "yarn.timeline-service.address": "%HOSTGROUP::master_nn1%:10200", "yarn.timeline-service.webapp.address": "%HOSTGROUP::master_nn1%:8188", "yarn.timeline-service.webapp.https.address": "%HOSTGROUP::master_nn1%:8190" } } } ] } 2. Register Management Pack HDPSearch is installed via an Ambari Management Pack. To automate the deployment of HDPSearch via a blueprint, you need to register the HDPSearch Management Pack with Cloudbreak. In the left menu, click on Cluster Extensions . This will expand expand to show Recipes and Management Packs . Click on Management Packs . You should see something similar to the following: Click on REGISTER MANAGEMENT PACK . You should see something similar to the following: Enter a unique Name and meaningful Description . The Management Pack URL for the HDPSearch 4.0 Management Pack should be http://public-repo-1.hortonworks.com/HDP-SOLR/hdp-solr-ambari-mp/solr-service-mpack-4.0.0.tar.gz . Click Create . You should see something similar to the following: 3. Create Cluster Now that we have a custom blueprint based on HDP 3.0 with a Solr component and we have registered the HDPSearch 4.0 Management Pack, we are ready to create a cluster. In the left menu, click on Clusters . Cloudbreak will display configured clusters. Click the CREATE CLUSTER button. Cloudbreak will display the Create Cluster wizard. a. General Configuration By default, the General Configuration screen is displayed using the BASIC view. The ADVANCED view gives you more control of AWS and cluster settings, to include features such as tags. You must use ADVANCED view to attach a Management Pack to a cluster. You can change your view to ADVANCED manually or you can change your Cloudbreak preferences to show ADVANCED view by default. You should see something similar to the following: Select Credential: Select the AWS credential you created. Most users will only have 1 credential per platform which will be selected automatically. Cluster Name: Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens. Region: Select the region in which you would like to launch your cluster. Availability Zone: Select the availability zone in which you would like to launch your cluster. Platform Version: Cloudbreak currently defaults to HDP 2.6. Select the dropdown arrow and select HDP 3.0 . Cluster Type: Select the custom blueprint you recently created. You should see something similar to the following: Click the green Next button. c. Image Settings Cloudbreak will display the Image Settings screen. This where you can specify a custom Cloudbreak image or change the version of Ambari and HDP used in the cluster. You should see something similar to the following: You do not need to change any settings on this page. Clck the green NEXT button. d. Hardware and Storage Cloudbreak will display the Hardware and Storage screen. On this screen, you have the ability to change the instance types, attached storage and where the Ambari server will be installed. As you you can see, the blueprint calls for deploying at least 4 nodes. We will use the defaults. Click the green Next button. e. Network and Availability Cloudbreak will display the Network and Availability screen. On this screen, you have the ability to create a new VPC and Subnet or select from existing ones. The default is to create a new VPC and Subnet. We will use the defaults. Click the green Next button. f. Cloud Storage Cloudbreak will display the Cloud Storage screen. On this screen, you have the ability to configure your cluster to have an instance profile allowing the cluster to access data on cloud storage. The default is to not configure cloud storage. We will use the defaults. Click the green Next button. g. Cluster Extensions Cloudbreak will display the Cluster Extensions screen. On this screen, you have the ability to associate recipes with differnet host groups and attach management packs to the cluser. You should see something similar to the following: On this screen is where we associate the HDPSearch 4.0 management pack we registered previously. Select the dropdown under Available Management Packs . Select the HDPSearch 4.0 management pack you registered. Then click the Install button. You should see something similar to the following: Click the green Next button. h. External Sources Cloudbreak will display the External Sources screen. On this screen, you have the ability associate external sources like LDAP/AD and databases. You should see somethig similar to the following: We will not be using this functionality with this cluster. Click the green Next button. i. Gateway Configuration Cloudbreak will display the Gateway Configuration screen. On this screen, you have the ability to enable a protected gateway. This gateway uses Knox to provide a secure access point for the cluster. You should see somethig similar to the following: We will use the defaults. Click the green Next button. j. Network Security Groups Cloudbreak will display the Network Security Groups screen. On this screen, you have the ability to specify the Network Security Groups . You should see something similar to the following: Cloudbreak defaults to creating new configurations. For production use cases, we highly recommend creating and refining your own definitions within the cloud platform. You can tell Cloudbreak to use those existing security groups by selecting the radio button. We need to add the Solr default port of 8983 to the host group where Solr will exist. This is the Data Node in the blueprint. I recommend that you specify "MyIP" to limit access to this port. You should see something similar to the following: Click the green Next button. f. Security Cloudbreak will display the Security screen. On this screen, you have the ability to specify the Ambari admin username and password. You can create a new SSH key or selecting an existing one. And finally, you have the ability to enable Kerberos on the cluster. We will use admin for the username and BadPass#1 for the password. Select an existing SSH key from the drop down list. This should be a key you have already created and have access to the corresponding private key. We will NOT be enabling Kerberos, so make sure the Enable Kerberos Security checkbox is not checked. You should see something similar to the following: Click the green CREATE CLUSTER button. g. Cluster Summary Cloudbreak will display the Cluster Summary page. It will generally take between 10-15 minutes for the cluster to be fully deployed. Click on the cluster you just created. You should see something similar to the following: Click on the Ambari URL to open the Ambari UI. h. Ambari You will likely see a browser warning when you first open the Ambari UI. That is because we are using self-signed certificates. Click on the ADVANCED button. Then click the link to Proceed . You will be presented with the Ambari login page. You will login using the username and password you specified when you created the cluster. That should have been admin and BadPass#1 . Click the green Sign In button. You should see the cluster summary screen. As you can see, we have a cluster a cluster which includes on the Solr component. Click on the Solr service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the Solr UI. You should see the Solr UI. As you can see, this is Solr 7.4 Review If you have successfully followed along with this tutorial, you should have created a custom HDP 3.0 blueprint which includes the Solr component, registered the HDPSearch 4.0 Management pack, and successfully deployed a cluster on AWS which included.

myoung · ‎10-26-2018

Objectives The release of Cloudbreak 2.7 enables you to deploy Hortonworks Data Flow (HDF) clusters. Currently there are two HDF cluster types supported: Flow Management (NiFi) and Messaging Management (Kafka). Cloudbreak expects HDF clusters to be deployed with security (LDAP, SSL). However, for testing purposes, many people would like to deploy a cluster without having to go through the steps of setting up SSL, LDAP, etc. Therefore, we'll need to modify the default HDF Flow Management blueprint to loosen the security configuration. This is not recommended for production use cases. This tutorial is designed to walk you through the process of of deploying an HDF 3.1 Flow Management Cluster on AWS using Cloudbreak 2.7 using a custom blueprint. Prerequisites You should already have an installed version of Cloudbreak 2.7. You can find the documentation on Cloudbreak here: Cloudbreak Documentation You can find an article that walks you through installing a local version of Cloudbreak with Vagrant and Virtualbox here: HCC Article You should have an AWS account with appropriate permissions. You can read more about AWS permissions here: Cloudbreak Documentation You should already have created your AWS credential in Cloudbreak. Scope This tutorial was tested in the following environment: Cloudbreak 2.7.0 AWS (also works on Azure and Google) Steps 1. Create New HDF Blueprint Login to your Cloudbreak instance. In the left menu, click on Blueprints . Cloudbreak will display a list of built-in and custom blueprints. Click on the Flow Management: Apache NiFi, Apache NiFi Registry blueprint. you should see something similar to the following: Now click on the RAW VIEW tab. You should see something similar to the following: Now we need to copy the raw JSON from this blueprint. We need to make some modifications. Copy and paste the blueprint into your favorite text editor. Change the blueprint_name line to "blueprint_name": "hdf-nifi-no-kerberos", . This is the name of the blueprint and it must be unique from other blueprints registered in Cloudbreak. In the nifi-properties section we need to add a new line. We are going to add "nifi.security.user.login.identity.provider": "" . This change tells NiFi not to use an Identity Provider. Change this: { "nifi-properties": { "nifi.sensitive.props.key": "changemeplease", "nifi.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$", "nifi.security.identity.mapping.value.kerb": "$1", } }, to this: { "nifi-properties": { "nifi.sensitive.props.key": "changemeplease", "nifi.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$", "nifi.security.identity.mapping.value.kerb": "$1", "nifi.security.user.login.identity.provider": "" } }, In the nifi-ambari-ssl-config section we need to change the nifi.node.ssl.isenabled settings from true to false . This change disables SSL between the NiFi nodes. Change this: "nifi-ambari-ssl-config": { "nifi.toolkit.tls.token": "changemeplease", "nifi.node.ssl.isenabled": "true", "nifi.toolkit.dn.prefix": "CN=", "nifi.toolkit.dn.suffix": ", OU=NIFI" } to this: "nifi-ambari-ssl-config": { "nifi.toolkit.tls.token": "changemeplease", "nifi.node.ssl.isenabled": "false", "nifi.toolkit.dn.prefix": "CN=", "nifi.toolkit.dn.suffix": ", OU=NIFI" } In the nifi-registry-ambari-ssl-config section we need to change the nifi.registry.ssl.isenabled settings from true to false . This change disables SSL for the NiFi Registry. Change this: "nifi-registry-ambari-ssl-config": { "nifi.registry.ssl.isenabled": "true", "nifi.registry.toolkit.dn.prefix": "CN=", "nifi.registry.toolkit.dn.suffix": ", OU=NIFI" } to this: "nifi-registry-ambari-ssl-config": { "nifi.registry.ssl.isenabled": "false", "nifi.registry.toolkit.dn.prefix": "CN=", "nifi.registry.toolkit.dn.suffix": ", OU=NIFI" } Under host_groups and Services we need to remove the NIFI_CA entry. This change removes the NiFi Certificate Authority. Change this: "host_groups": [ { "name": "Services", "components": [ { "name": "NIFI_CA" }, { "name": "NIFI_REGISTRY_MASTER" }, to this: "host_groups": [ { "name": "Services", "components": [ { "name": "NIFI_REGISTRY_MASTER" }, The complete blueprint looks like this: { "Blueprints": { "blueprint_name": "hdf-nifi-no-kerberos", "stack_name": "HDF", "stack_version": "3.1" }, "configurations": [ { "nifi-ambari-config": { "nifi.security.encrypt.configuration.password": "changemeplease", "nifi.max_mem": "1g" } }, { "nifi-properties": { "nifi.sensitive.props.key": "changemeplease", "nifi.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$", "nifi.security.identity.mapping.value.kerb": "$1", "nifi.security.user.login.identity.provider": "" } }, { "nifi-ambari-ssl-config": { "nifi.toolkit.tls.token": "changemeplease", "nifi.node.ssl.isenabled": "false", "nifi.toolkit.dn.prefix": "CN=", "nifi.toolkit.dn.suffix": ", OU=NIFI" } }, { "nifi-registry-ambari-config": { "nifi.registry.security.encrypt.configuration.password": "changemeplease" } }, { "nifi-registry-properties": { "nifi.registry.sensitive.props.key": "changemeplease", "nifi.registry.security.identity.mapping.pattern.kerb": "^(.*?)@(.*?)$", "nifi.registry.security.identity.mapping.value.kerb": "$1" } }, { "nifi-registry-ambari-ssl-config": { "nifi.registry.ssl.isenabled": "false", "nifi.registry.toolkit.dn.prefix": "CN=", "nifi.registry.toolkit.dn.suffix": ", OU=NIFI" } } ], "host_groups": [ { "name": "Services", "components": [ { "name": "NIFI_REGISTRY_MASTER" }, { "name": "METRICS_COLLECTOR" }, { "name": "METRICS_MONITOR" }, { "name": "METRICS_GRAFANA" }, { "name": "ZOOKEEPER_CLIENT" } ], "cardinality": "1" }, { "name": "NiFi", "components": [ { "name": "NIFI_MASTER" }, { "name": "METRICS_MONITOR" }, { "name": "ZOOKEEPER_CLIENT" } ], "cardinality": "1+" }, { "name": "ZooKeeper", "components": [ { "name": "ZOOKEEPER_SERVER" }, { "name": "METRICS_MONITOR" }, { "name": "ZOOKEEPER_CLIENT" } ], "cardinality": "3+" } ] } Save the updated blueprint to a file. Click on the CREATE BLUEPRINT button. You should see the Create Blueprint screen. Enter the name of the new blueprint, something helpful such as hdf-nifi-no-kerberos . Click on the Upload JSON File button and upload the blueprint you just saved. You should see the new blueprint you created. 2. Create Cluster In the left menu, click on Clusters . Cloudbreak will display configured clusters. Click the CREATE CLUSTER button. Cloudbreak will display the Create Cluster wizard a. General Configuration By default, the General Configuration screen is displayed using the BASIC view. The ADVANCED view gives you more control of AWS and cluster settings, to include features such as tags. You can change your view to ADVANCED manually or you can change your Cloudbreak preferences to show ADVANCED view by default. We will use the BASIC view. Credential: Select the AWS credential you created. Most users will only have 1 credential per platform which will be selected automatically. Cluster Name: Enter a name for your cluster. The name must be between 5 and 40 characters, must start with a letter, and must only include lowercase letters, numbers, and hyphens. Region: Select the region in which you would like to launch your cluster. Platform Version: Cloudbreak currently defaults to HDP 2.6. Select the dropdown arrow and select HDF 3.1 . Cluster Type: As mentioned previously, there are two supported cluster types. Make sure select the blueprint you just created. Click the green NEXT button. c. Hardware and Storage Cloudbreak will display the Hardware and Storage screen. On this screen, you have the ability to change the instance types, attached storage and where the Ambari server will be installed. As you you can see, we will deploy 1 NiFi and 1 Zookeeper node. In a production environment you would typically have at least 3 Zookeeper nodes. We will use the defaults. Click the green NEXT button. d. Gateway Configuration Cloudbreak will display the Gateway Configuration screen. On this screen, you have the ability to enable a protected gateway. This gateway uses Knox to provide a secure access point for the cluster. Cloudbreak 2.7 does not currently support configuring Knox for HDF. We will leave this option disabled. Click the green NEXT button. e. Network Cloudbreak will display the Network screen. On this screen, you have the ability to specify the Network , Subnet , and Security Groups . Cloudbreak defaults to creating new configurations. For production use cases, we highly recommend creating and refining your own definitions within the cloud platform. You can tell Cloudbreak to use those via the drop down menus. We will use the default options to create new configurations. Because we are using a custom blueprint which disables SSL, we need to update the security groups with correct ports for the NiFi and NiFi Registry UIs. In the SERVICES security group, add the port 61080 with TCP . Click the + button to add the rule. In the NIFI security group, add the port 9090 with TCP . Click the + button to add the rule. You should see something similar the following: Click the green NEXT button. f. Security Cloudbreak will display the Security screen. On this screen, you have the ability to specify the Ambari admin username and password. You can create a new SSH key or selecting an existing one. And finally, you have the ability to enable Kerberos on the cluster. We will use admin for the username and BadPass#1 for the password. Select an existing SSH key from the drop down list. This should be a key you have already created and have access to the corresponding private key. We will NOT be enabling Kerberos, so uncheck the Enable Kerberos Security checkbox. You have the ability to display a JSON version of the blueprint. You also have the ability display a JSON version of the cluster definition. Both of these can be used with Cloudbreak CLI to programatically automate these operations. Click the green CREATE CLUSTER button. g. Cluster Summary Cloudbreak will display the Cluster Summary page. It will generally take between 10-15 minutes for the cluster to be fully deployed. As you can see, this screen looks similar to and HDP cluster. The big difference is the Blueprint and HDF Version . Click on the Ambari URL to open the Ambari UI. h. Ambari You will likely see a browser warning when you first open the Ambari UI. That is because we are using self-signed certificates. Click on the ADVANCED button. Then click the link to Proceed . You will be presented with the Ambari login page. You will login using the username and password you specified when you created the cluster. That should have been admin and BadPass#1 . Click the green Sign In button. You should see the cluster summary screen. As you can see, we have a cluster with Zookeeper, NiFi, and the NiFi Registry. Click on the NiFi service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the NiFi UI. You should see the NiFi UI. Back in the Ambari UI, click on the NiFi Registry service in the left hand menu. Now you can access the Quick Links menu for a shortcut to the NiFi Registry UI. You should see the NiFi Registry UI. Review If you have successfully followed along with this tutorial, you should have created a Flow Management (NiFi) cluster on AWS using a custom blueprint. This cluster has SSL and LDAP configurations disabled for rapid prototyping abilities.

myoung · ‎07-04-2018

@mmolnar This is great feedback. I've updated the article to include a link for downloading the files. Thank you!

144675 · ‎05-30-2018

@amit nandi can you provide a step by step instruction on how to install anaconda for HDP ?

ccibi75 · ‎11-08-2017

@Michael Young, Nice article, i have followed the steps and trying to setup auto-scaling for a hadoop cluster. One issue i'm facing is that the ambari metrics are not getting listed while creating a alert. Any help would be appreciable. Thanks!

srinivas_n462 · ‎08-23-2017

After building the image, Error response from daemon: No such image: zeppelinhub:latest http://amazonwebservicesforum.com , after pulling the image, Error response from daemon: No such image: zeppelinhub:latest http://amazonwebservicesforum.com

myoung · ‎05-24-2017

This tutorial will walk you through the process of using Cloudbreak recipes to install TensorFlow for Anaconda Python on an HDP 2.6 cluster during cluster provisioning. We'll then update Zeppelin to use the newly install version of Anaconda and run a quick TensorFlow test. Prerequisites You should already have a Cloudbreak v1.14.4 environment running. You can follow this article to create a Cloudbreak instance using Vagrant and Virtualbox: HCC Article You should already have created a blueprint that deploys HDP 2.6 with Spark 2.1. You can follow this article to get the blueprint setup. Do not create the cluster yet, as we will do that in this tutorial: HCC Article You should already have credentials created in Cloudbreak for deploying on AWS (or Azure). This tutorial does not cover creating credentials. Scope This tutorial was tested in the following environment: Cloudbreak 1.14.4 AWS EC2 HDP 2.6 Spark 2.1 Anaconda 2.7.13 TensorFlow 1.1.0 Steps Create Recipe Before you can use a recipe during a cluster deployment, you have to create the recipe. In the Cloudbreak UI, look for the mange recipes section. It should look similar to this: If this is your first time creating a recipe, you will have 0 recipes instead of the 2 recipes show in my interface. Now click on the arrow next to manage recipes to display available recipes. You should see something similar to this: Now click on the green create recipe button. You should see something similar to this: Now we can enter the information for our recipe. I'm calling this recipe tensorflow . I'm giving it the description of Install TensorFlow Python . You can choose to run the script as either pre-install or post-install . I'm choosing to do the install post-install . This means the script will be run after the Ambari installation process has started. So choose the Execution Type of POST . The script is fairly basic. We are going to download the Anaconda install script, then run it in silent mode. Then we'll use the Anaconda version of pip to install TensorFlow. Here is the script: #!/bin/bash wget https://repo.continuum.io/archive/Anaconda2-4.3.1-Linux-x86_64.sh bash ./Anaconda2-4.3.1-Linux-x86_64.sh -b -p /opt/anaconda /opt/anaconda/bin/pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.1.0-cp27-none-linux_x86_64.whl You can read more about installing TensorFlow on Anaconda here: TensorFlow Docs. When you have finished entering all of the information, you should see something similar to this: If everything looks good, click on the green create recipe button. You should be able to see the recipe in your list of recipes: NOTE: You will most likely have a different list of recipes. Create a Cluster using a Recipe Now that our recipe has been created, we can create a cluster that uses the recipe. Go through the process of creating a cluster up to the Choose Blueprint step. This step is where you select the recipe you want to use. The recipes are not selected by default; you have to select the recipes you wish to use. You can specify recipes for 1 or more host groups. This allows you to run different recipes across different host groups (masters, slaves, etc). You can also select multiple recipes. We want to use the hdp26-spark-21-cluster blueprint. This will create an HDP 2.6 cluster with Spark 2.1 and Zeppelin. You should have created this blueprint when you followed the prerequisite tutorial. You should see something similar to this: In our case, we are going to run the tensorflow recipe on every host group. If you intend to use something like TensorFlow across the cluster, you should install it on at least the slave nodes and the client nodes. After you have selected the recipe for the host groups, click the Review & Launch button, then launch the cluster. As the cluster is building, you should see a message in the Cloudbreak UI that indicates the recipe is running. When that happens, you will see something similar to this: If you click on the building cluster, you can see more detailed information. You should see something similar to this: Once the cluster has finished building, you should see something similar to this: Cloudbreak will create logs for each recipe that runs on each host. These logs are located at /var/log/recipe and have the name of the recipe and whether it is pre or post install. For example, our recipe log is called post-tensorflow.log . You can tail this log file to following the execution of the script. NOTE: Post install scripts won't be executed until the Ambari server is installed and the cluster is building. You can always monitor the /var/log/recipe directory on a node to see when the script is being executed. The time it takes to run the script will vary depending on the cloud environment and how long it takes to spin up the cluster. On your cluster, you should be able to see the post-install log: $ ls /var/log/recipes post-tensorflow.log post-hdfs-home.log Verify Anaconda Install Once the install process is complete, you should be able to verify that Anaconda is installed. You need to ssh into one of the cloud instances. You can get the public ip address from the Cloudbreak UI. You will login using the corresponding private key to the public key you entered when you created the Cloudbreak credential. You should login as the cloudbreak user. You should see something similar to this: $ ssh -i ~/Downloads/keys/cloudbreak_id_rsa cloudbreak@#.#.#.# The authenticity of host '#.#.#.# (#.#.#.#)' can't be established. ECDSA key fingerprint is SHA256:By1MJ2sYGB/ymA8jKBIfam1eRkDS5+DX1THA+gs8sdU. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '#.#.#.#' (ECDSA) to the list of known hosts. Last login: Sat May 13 00:47:41 2017 from 192.175.27.2 __| __|_ ) _| ( / Amazon Linux AMI ___|\___|___| https://aws.amazon.com/amazon-linux-ami/2016.09-release-notes/ 25 package(s) needed for security, out of 61 available Run "sudo yum update" to apply all updates. Amazon Linux version 2017.03 is available. Once you are on the server, you can check the version of Python: $ /opt/anaconda/bin/python --version Python 2.7.13 :: Anaconda 4.3.1 (64-bit) Update Zeppelin Interpreter We need to update the default spark2 interpreter configuration in Zeppelin. We need to access the Zeppelin UI from Ambari. You can login to Ambari for the new cluster from the Cloudbreak UI cluster details page. Once you login to Ambari, you can access the Zeppelin UI from the Ambari Quicklink. You should see something similar to this: After you access the Zeppelin UI, click the blue login button in the upper right corner of the interface. You can login using the default username and password of admin . After you login to Zeppelin, click the admin button in the upper right corner of the interface. This will expose the options menu. You should see something similar to this: Click on the Interpreter link in the menu. This will display all of the configured interpreters. Find the spark2 interpreter. You can see the default setting for zeppelin.pyspark.python is set to python . This will use whichever Python is found in the path. You should see something similar to this: We will need to change this to /opt/anaconda/bin/python which is where we have Anaconda Python installed. Click on the edit button and change zeppelin.pyspark.python to /opt/anaconda/bin/python . You should see something similar to this: Now we can click the blue save button at the bottom. The configuration changes are now saved, but we need to restart the interpreter for the changes to take affect. Click on the restart button to restart the interpreter. Create Zeppelin Notebook Now that our spark2 interpreter configuration has been updated, we can create a notebook to test Anaconda + TensorFlow. Click on the Notebook menu. You should see something similar to this: Click on the Create new note link. You can give the notebook any descriptive name you like. Select spark2 as the default interpreter. You should see something similar to this: Your notebook will start with a blank paragraph. For the first paragraph, let's test the version of Spark we are using. Enter the following in the first paragraph: %spark2.pyspark sc.version Now click the run button for the paragraph. You should see something similar to this: u'2.1.0.2.6.0.3-8' As you can see, we are using Spark 2.1 Now in the second paragraph, we'll test the version of Python. We already know the command line verison is 2.7.13. Enter the following in the second paragraph: %spark2.pyspark import sys print sys.version_info Now click the run button for the paragraph. You should see something similar to this: sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0) As you can see, we are runnig Python version 2.7.13. Now we can test TensorFlow. Enter the following in the third paragraph: %spark2.pyspark import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello)) a = tf.constant(10) b = tf.constant(32) print(sess.run(a + b)) This simple code comes from the TensorFlow website: [TensorFlow] (https://www.tensorflow.org/versions/r0.10/get_started/os_setup#anaconda_installation). Now click the run button for the paragraph. You may see some warning messages the first time you run it, but you should also see the following output: Hello, TensorFlow! 42 As you can see, TensorFlow is working from Zeppelin which is using Spark 2.1 and Anaconda. If everything works properly, your notebook should look something similar this: Admittedly this example is very basic, but it demonstrates the components are working together. For next steps, try running other TensorFlow code. Here are some examples you can work with: GitHub. Review If you have successfully followed along with this tutorial, you should have deployed an HDP 2.6 cluster in the cloud with Anaconda installed under /opt/anaconda and added the TensorFlow Python modules using a Cloudbreak recipe. You should have created a Zeppelin notebook which uses Anaconda Python, Spark 2.1 and TensorFlow.

Online	Offline
Last Visited	‎02-08-2019 07:03 PM

Member Since	‎02-09-2016 09:44 PM
Last Visited	‎02-08-2019 07:03 PM
Posts	559
Kudos received	413

Cloudera Community

Re: How can I force the getTwitter processor to no...

Re: Send Ambari Metric to Elasticsearch

Re: Ingesting unformatted, unordered data from hdf...

Re: What would the audit record on Zeppelin users ...

Re: Automate loading data into HDFS

Re: Query Hive Using Python

Re: How to read HL7 message in NiFi Processor. Is ...

Re: hive overwrtite table

Using Cloudbreak 2.8 TP to deploy an HDP 3.0 clust...

Using Cloudbreak to create a Flow Management (NiFi...

Re: Using Vagrant and Virtualbox to create a local...

Re: Zeppelin + PySpark - Adding Libraries Numpy/Pa...

Re: How to create a custom Ambari Alert and use it...

Re: Docker tutorial: Error response from daemon: ...

Using Cloudbreak recipes to deploy Anaconda and Te...