Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Rising Star

Introduction

Overview

Cloudera/Hortonworks offers today one of the most comprehensive data management platform, with components allowing you data flow management to governed and distributed Data Science workloads.

With so many toys to play with, I thought I'd share an easy way to setup a simple cluster that will, using Cloudbreak, setup the following main components on Azure cloud:

  • Hortonworks Data Platform 3.1
  • Hortonworks Data Flow 3.3
  • Data Platform Search 4.0
  • Cloudera Data Science Workbench 1.5

Note: This is not a production-ready setup, but merely a first step to customizing your deployment using the Cloudera toolkit.

Pre-Requisites

Tutorial steps

  • Step 1: Setup Azure Credentials
  • Step 2: Setup blueprint and cluster extensions
  • Step 3: Create cluster

Step 1: Setup Azure Credentials

Find your Azure subscription and tenant ID

To find your subscription ID, go to the search box and look for subscription; you should find it as such:

108114-screen-shot-2019-04-23-at-23119-pm.png

For the tenant ID, use the Azure AD Directory ID:

108147-screen-shot-2019-04-23-at-23216-pm.png

Setup your credentials in Cloudbreak

This part is extremely well documented in Cloudbreak's documentation portal: https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.9.0/create-credential-azure/conten....

Note: Because of IT restrictions on my side, I chose to use an app based credential setup, but if you have enough privileges, Cloudbreak creates the app and assigns roles automagically for you.

Step 2: Setup blueprint and cluster extensions

Blueprint

108115-screen-shot-2019-04-23-at-31052-pm.png

First upload the blueprint below:

{
  "Blueprints": {
    "blueprint_name": "edge-to-ai-3.1",
    "stack_name": "HDP",
    "stack_version": "3.1"
  },
  "configurations": [
    {
      "yarn-site": {
        "properties": {
          "yarn.nodemanager.resource.cpu-vcores": "6",
          "yarn.nodemanager.resource.memory-mb": "60000",
          "yarn.scheduler.maximum-allocation-mb": "14"
        }
      }
    },
    {
      "hdfs-site": {
        "properties": {
          "dfs.cluster.administrators": "hdfs"
        }
      }
    },
    {
      "capacity-scheduler": {
        "properties": {
          "yarn.scheduler.capacity.maximum-am-resource-percent": "0.4",
          "yarn.scheduler.capacity.root.capacity": "67",
          "yarn.scheduler.capacity.root.default.capacity": "67",
          "yarn.scheduler.capacity.root.default.maximum-capacity": "67",
          "yarn.scheduler.capacity.root.llap.capacity": "33",
          "yarn.scheduler.capacity.root.llap.maximum-capacity": "33",
          "yarn.scheduler.capacity.root.queues": "default,llap"
        }
      }
    },
    {
      "ranger-hive-audit": {
        "properties": {
          "xasecure.audit.destination.hdfs.file.rollover.sec": "300"
        },
        "properties_attributes": {}
      }
    },
    {
      "hive-site": {
        "hive.exec.compress.output": "true",
        "hive.merge.mapfiles": "true",
        "hive.metastore.dlm.events": "true",
        "hive.metastore.transactional.event.listeners": "org.apache.hive.hcatalog.listener.DbNotificationListener",
        "hive.repl.cm.enabled": "true",
        "hive.repl.cmrootdir": "/apps/hive/cmroot",
        "hive.repl.rootdir": "/apps/hive/repl",
        "hive.server2.tez.initialize.default.sessions": "true",
        "hive.server2.transport.mode": "http"
      }
    },
    {
      "hive-interactive-env": {
        "enable_hive_interactive": "true",
        "hive_security_authorization": "Ranger",
        "num_llap_nodes": "1",
        "num_llap_nodes_for_llap_daemons": "1",
        "num_retries_for_checking_llap_status": "50"
      }
    },
    {
      "hive-interactive-site": {
        "hive.exec.orc.split.strategy": "HYBRID",
        "hive.llap.daemon.num.executors": "5",
        "hive.metastore.rawstore.impl": "org.apache.hadoop.hive.metastore.cache.CachedStore",
        "hive.stats.fetch.bitvector": "true"
      }
    },
    {
      "spark2-defaults": {
        "properties": {
          "spark.datasource.hive.warehouse.load.staging.dir": "/tmp",
          "spark.datasource.hive.warehouse.metastoreUri": "thrift://%HOSTGROUP::master1%:9083",
          "spark.hadoop.hive.zookeeper.quorum": "{{zookeeper_quorum_hosts}}",
          "spark.sql.hive.hiveserver2.jdbc.url": "jdbc:hive2://{{zookeeper_quorum_hosts}}:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive",
          "spark.sql.hive.hiveserver2.jdbc.url.principal": "hive/_HOST@EC2.INTERNAL"
        },
        "properties_attributes": {}
      }
    },
    {
      "gateway-site": {
        "properties": {
          "gateway.path": "{{cluster_name}}"
        },
        "properties_attributes": {}
      }
    },
    {
      "admin-topology": {
        "properties": {
          "content": "\n \n\n \n\n \n authentication\n ShiroProvider\n true\n \n sessionTimeout\n 30\n \n \n main.ldapRealm\n org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm\n \n \n main.ldapRealm.userDnTemplate\n uid={0},ou=people,dc=hadoop,dc=apache,dc=org\n \n \n main.ldapRealm.contextFactory.url\n ldap://54.219.163.9:33389\n \n \n main.ldapRealm.contextFactory.authenticationMechanism\n simple\n \n \n urls./**\n authcBasic\n \n \n\n \n authorization\n AclsAuthz\n true\n \n \n\n \n KNOX\n \n\n "
        },
        "properties_attributes": {}
      }
    },
    {
      "ranger-admin-site": {
        "properties": {
          "ranger.jpa.jdbc.url": "jdbc:postgresql://localhost:5432/ranger"
        },
        "properties_attributes": {}
      }
    },
    {
      "ranger-env": {
        "properties": {
          "is_solrCloud_enabled": "true",
          "keyadmin_user_password": "{{{ general.password }}}",
          "ranger-atlas-plugin-enabled": "Yes",
          "ranger-hdfs-plugin-enabled": "Yes",
          "ranger-hive-plugin-enabled": "Yes",
          "ranger-knox-plugin-enabled": "Yes",
          "ranger_admin_password": "{{{ general.password }}}",
          "rangertagsync_user_password": "{{{ general.password }}}",
          "rangerusersync_user_password": "{{{ general.password }}}"
        },
        "properties_attributes": {}
      }
    },
    {
      "ams-hbase-site": {
        "properties": {
          "hbase.cluster.distributed": "true",
          "hbase.rootdir": "file:///hadoopfs/fs1/metrics/hbase/data"
        }
      }
    },
    {
      "atlas-env": {
        "properties": {
          "atlas.admin.password": "admin",
          "atlas_solr_shards": "2",
          "content": "\n # The java implementation to use. If JAVA_HOME is not found we expect java and jar to be in path\n export JAVA_HOME={{java64_home}}\n\n # any additional java opts you want to set. This will apply to both client and server operations\n {% if security_enabled %}\n export ATLAS_OPTS=\"{{metadata_opts}} -Djava.security.auth.login.config={{atlas_jaas_file}}\"\n {% else %}\n export ATLAS_OPTS=\"{{metadata_opts}}\"\n {% endif %}\n\n # metadata configuration directory\n export ATLAS_CONF={{conf_dir}}\n\n # Where log files are stored. Defatult is logs directory under the base install location\n export ATLAS_LOG_DIR={{log_dir}}\n\n # additional classpath entries\n export ATLASCPPATH={{metadata_classpath}}\n\n # data dir\n export ATLAS_DATA_DIR={{data_dir}}\n\n # pid dir\n export ATLAS_PID_DIR={{pid_dir}}\n\n # hbase conf dir\n export HBASE_CONF_DIR=\"/etc/ams-hbase/conf\"\n\n # Where do you want to expand the war file. By Default it is in /server/webapp dir under the base install dir.\n export ATLAS_EXPANDED_WEBAPP_DIR={{expanded_war_dir}}\n export ATLAS_SERVER_OPTS=\"-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$ATLAS_LOG_DIR/atlas_server.hprof -Xloggc:$ATLAS_LOG_DIRgc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps\"\n {% if java_version == 8 %}\n export ATLAS_SERVER_HEAP=\"-Xms{{atlas_server_xmx}}m -Xmx{{atlas_server_xmx}}m -XX:MaxNewSize={{atlas_server_max_new_size}}m -XX:MetaspaceSize=100m -XX:MaxMetaspaceSize=512m\"\n {% else %}\n export ATLAS_SERVER_HEAP=\"-Xms{{atlas_server_xmx}}m -Xmx{{atlas_server_xmx}}m -XX:MaxNewSize={{atlas_server_max_new_size}}m -XX:MaxPermSize=512m\"\n {% endif %}\n",
          "hbase_conf_dir": "/etc/ams-hbase/conf"
        }
      }
    },
    {
      "kafka-broker": {
        "properties": {
          "default.replication.factor": "1",
          "offsets.topic.replication.factor": "1"
        },
        "properties_attributes": {}
      }
    },
    {
      "hbase-env": {
        "properties": {
          "phoenix_sql_enabled": "true"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-common": {
        "properties": {
          "druid.extensions.loadList": "[\"postgresql-metadata-storage\", \"druid-datasketches\", \"druid-hdfs-storage\", \"druid-kafka-indexing-service\", \"ambari-metrics-emitter\"]",
          "druid.indexer.logs.directory": "/user/druid/logs",
          "druid.indexer.logs.type": "hdfs",
          "druid.metadata.storage.connector.connectURI": "jdbc:postgresql://%HOSTGROUP::master1%:5432/druid",
          "druid.metadata.storage.connector.password": "druid",
          "druid.metadata.storage.connector.user": "druid",
          "druid.metadata.storage.type": "postgresql",
          "druid.selectors.indexing.serviceName": "druid/overlord",
          "druid.storage.storageDirectory": "/user/druid/data",
          "druid.storage.type": "hdfs"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-overlord": {
        "properties": {
          "druid.indexer.runner.type": "remote",
          "druid.indexer.storage.type": "metadata",
          "druid.port": "8090",
          "druid.service": "druid/overlord"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-middlemanager": {
        "properties": {
          "druid.indexer.runner.javaOpts": "-server -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dhdp.version={{stack_version}} -Dhadoop.mapreduce.job.classloader=true",
          "druid.port": "8091",
          "druid.processing.numThreads": "2",
          "druid.server.http.numThreads": "50",
          "druid.service": "druid/middlemanager",
          "druid.worker.capacity": "3"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-coordinator": {
        "properties": {
          "druid.coordinator.merge.on": "false",
          "druid.port": "8081"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-historical": {
        "properties": {
          "druid.port": "8083",
          "druid.processing.numThreads": "2",
          "druid.server.http.numThreads": "50",
          "druid.server.maxSize": "300000000000",
          "druid.service": "druid/historical"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-broker": {
        "properties": {
          "druid.broker.http.numConnections": "5",
          "druid.cache.type": "local",
          "druid.port": "8082",
          "druid.processing.numThreads": "2",
          "druid.server.http.numThreads": "50",
          "druid.service": "druid/broker"
        },
        "properties_attributes": {}
      }
    },
    {
      "druid-router": {
        "properties": {},
        "properties_attributes": {}
      }
    },
    {
      "superset": {
        "properties": {
          "SECRET_KEY": "{{{ general.password }}}",
          "SUPERSET_DATABASE_TYPE": "sqlite"
        },
        "properties_attributes": {}
      }
    },
    {
      "nifi-ambari-config": {
        "nifi.max_mem": "4g",
        "nifi.security.encrypt.configuration.password": "{{{ general.password }}}",
        "nifi.sensitive.props.key": "{{{ general.password }}}"
      }
    },
    {
      "nifi-properties": {
        "nifi.security.user.login.identity.provider": "",
        "nifi.sensitive.props.key": "{{{ general.password }}}"
      }
    },
    {
      "nifi-registry-ambari-config": {
        "nifi.registry.security.encrypt.configuration.password": "{{{ general.password }}}"
      }
    },
    {
      "nifi-registry-properties": {
        "nifi.registry.db.password": "{{{ general.password }}}",
        "nifi.registry.sensitive.props.key": "{{{ general.password }}}"
      }
    },
    {
      "registry-common": {
        "properties": {
          "adminPort": "7789",
          "database_name": "registry",
          "jar.storage": "/hdf/registry",
          "jar.storage.hdfs.url": "hdfs://localhost:9090",
          "jar.storage.type": "local",
          "port": "7788",
          "registry.schema.cache.expiry.interval": "3600",
          "registry.schema.cache.size": "10000",
          "registry.storage.connector.connectURI": "jdbc:mysql://localhost:3306/registry",
          "registry.storage.connector.password": "registry",
          "registry.storage.connector.user": "registry",
          "registry.storage.query.timeout": "30",
          "registry.storage.type": "mysql"
        },
        "properties_attributes": {}
      }
    },
    {
      "hbase-site": {
        "properties": {
          "hbase.bucketcache.combinedcache.enabled": "true",
          "hbase.bucketcache.ioengine": "file:/hbase/cache",
          "hbase.bucketcache.size": "24000",
          "hbase.defaults.for.version.skip": "true",
          "hbase.hregion.max.filesize": "21474836480",
          "hbase.hregion.memstore.flush.size": "536870912",
          "hbase.region.server.rpc.scheduler.factory.class": "org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory",
          "hbase.regionserver.global.memstore.size": "0.4",
          "hbase.regionserver.handler.count": "60",
          "hbase.regionserver.wal.codec": "org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec",
          "hbase.rootdir": "/apps/hbase",
          "hbase.rpc.controllerfactory.class": "org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory",
          "hbase.rs.cacheblocksonwrite": "true",
          "hfile.block.bloom.cacheonwrite": "true",
          "hfile.block.cache.size": "0.4",
          "hfile.block.index.cacheonwrite": "true",
          "phoenix.functions.allowUserDefinedFunctions": "true",
          "phoenix.query.timeoutMs": "60000"
        },
        "properties_attributes": {}
      }
    },
    {
      "hbase-env": {
        "properties": {
          "hbase_java_io_tmpdir": "/tmp",
          "hbase_log_dir": "/var/log/hbase",
          "hbase_master_heapsize": "1024m",
          "hbase_pid_dir": "/var/run/hbase",
          "hbase_regionserver_heapsize": "16384m",
          "hbase_regionserver_shutdown_timeout": "30",
          "hbase_regionserver_xmn_max": "16384",
          "hbase_regionserver_xmn_ratio": "0.2",
          "hbase_user": "hbase",
          "hbase_user_nofile_limit": "32000",
          "hbase_user_nproc_limit": "16000",
          "phoenix_sql_enabled": "true"
        },
        "properties_attributes": {}
      }
    }
  ],
  "host_groups": [
    {
      "cardinality": "1",
      "components": [
        {
          "name": "RANGER_TAGSYNC"
        },
        {
          "name": "RANGER_USERSYNC"
        },
        {
          "name": "RANGER_ADMIN"
        },
        {
          "name": "KNOX_GATEWAY"
        },
        {
          "name": "HIVE_SERVER"
        },
        {
          "name": "HIVE_METASTORE"
        },
        {
          "name": "DRUID_OVERLORD"
        },
        {
          "name": "DRUID_COORDINATOR"
        },
        {
          "name": "DRUID_ROUTER"
        },
        {
          "name": "DRUID_BROKER"
        },
        {
          "name": "SECONDARY_NAMENODE"
        },
        {
          "name": "HISTORYSERVER"
        },
        {
          "name": "APP_TIMELINE_SERVER"
        },
        {
          "name": "REGISTRY_SERVER"
        },
        {
          "name": "NIFI_REGISTRY_MASTER"
        },
        {
          "name": "DATANODE"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        },
        {
          "name": "INFRA_SOLR_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "ATLAS_CLIENT"
        },
        {
          "name": "HBASE_CLIENT"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "LOGSEARCH_LOGFEEDER"
        },
        {
          "name": "SPARK2_CLIENT"
        }
      ],
      "name": "master1"
    },
    {
      "cardinality": "1+",
      "components": [
        {
          "name": "NODEMANAGER"
        },
        {
          "name": "DATANODE"
        },
        {
          "name": "HBASE_REGIONSERVER"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "ATLAS_CLIENT"
        },
        {
          "name": "HBASE_CLIENT"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "LOGSEARCH_LOGFEEDER"
        },
        {
          "name": "KAFKA_BROKER"
        },
        {
          "name": "NIFI_MASTER"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        },
        {
          "name": "HBASE_CLIENT"
        },
        {
          "name": "SPARK2_CLIENT"
        }
      ],
      "name": "worker"
    },
    {
      "cardinality": "1",
      "components": [
        {
          "name": "ATLAS_SERVER"
        },
        {
          "name": "HBASE_MASTER"
        },
        {
          "name": "METRICS_COLLECTOR"
        },
        {
          "name": "RESOURCEMANAGER"
        },
        {
          "name": "DRUID_HISTORICAL"
        },
        {
          "name": "DRUID_MIDDLEMANAGER"
        },
        {
          "name": "LIVY2_SERVER"
        },
        {
          "name": "SPARK2_JOBHISTORYSERVER"
        },
        {
          "name": "DATANODE"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "ATLAS_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "TEZ_CLIENT"
        },
        {
          "name": "HBASE_CLIENT"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "LOGSEARCH_LOGFEEDER"
        },
        {
          "name": "LOGSEARCH_SERVER"
        },
        {
          "name": "NAMENODE"
        },
        {
          "name": "SUPERSET"
        },
        {
          "name": "NIFI_CA"
        },
        {
          "name": "INFRA_SOLR"
        },
        {
          "name": "METRICS_GRAFANA"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "HBASE_MASTER"
        },
        {
          "name": "HBASE_CLIENT"
        },
        {
          "name": "SPARK2_CLIENT"
        }
      ],
      "name": "master2"
    },
    {
      "name": "cdsw_worker",
      "cardinality": "1+",
      "components": [
        {
          "name": "SPARK2_CLIENT"
        },
        {
          "name": "ZOOKEEPER_CLIENT"
        },
        {
          "name": "YARN_CLIENT"
        },
        {
          "name": "HDFS_CLIENT"
        },
        {
          "name": "MAPREDUCE2_CLIENT"
        },
        {
          "name": "HIVE_CLIENT"
        },
        {
          "name": "NODEMANAGER"
        },
        {
          "name": "DATANODE"
        },
        {
          "name": "KAFKA_BROKER"
        },
        {
          "name": "NIFI_MASTER"
        },
        {
          "name": "ZOOKEEPER_SERVER"
        },
        {
          "name": "HBASE_REGIONSERVER"
        },
        {
          "name": "HBASE_CLIENT"
        },
        {
          "name": "METRICS_MONITOR"
        },
        {
          "name": "TEZ_CLIENT"
        }
      ]
    }
  ],
  "settings": [
    {
      "recovery_settings": [
        {
          "recovery_enabled": "false"
        }
      ]
    }
  ]
}

Recipes

108163-screen-shot-2019-04-23-at-32047-pm.png

Pre Ambari start recipe to setup metastores

#!/usr/bin/env bash

# Intialize MetaStores

yum install -y https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-redhat96-9.6-3.noarch.rp...
yum install -y postgresql96-server
yum install -y postgresql96-contrib
/usr/pgsql-9.6/bin/postgresql96-setup initdb
sed -i 's,#port = 5432,port = 5433,g' /var/lib/pgsql/9.6/data/postgresql.conf

echo '' >  /var/lib/pgsql/9.6/data/pg_hba.conf
echo 'local all das,streamsmsgmgr,cloudbreak,registry,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid           trust  ' >> /var/lib/pgsql/9.6/data/pg_hba.conf
echo 'host  all das,streamsmsgmgr,cloudbreak,registry,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid 0.0.0.0/0 trust  ' >> /var/lib/pgsql/9.6/data/pg_hba.conf
echo 'host  all das,streamsmsgmgr,cloudbreak,registry,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid ::/0      trust  ' >> /var/lib/pgsql/9.6/data/pg_hba.conf
echo 'local all             all                                              peer  ' >> /var/lib/pgsql/9.6/data/pg_hba.conf
echo 'host  all             all             127.0.0.1/32                      ident  ' >> /var/lib/pgsql/9.6/data/pg_hba.conf
echo 'host  all             all             ::1/128                           ident  ' >> /var/lib/pgsql/9.6/data/pg_hba.conf

systemctl enable postgresql-9.6.service
systemctl start postgresql-9.6.service

echo "CREATE DATABASE streamsmsgmgr;" | sudo -u postgres psql -U postgres -h localhost -p 5433
echo "CREATE USER streamsmsgmgr WITH PASSWORD 'streamsmsgmgr';" | sudo -u postgres psql -U postgres -h localhost -p 5433
echo "GRANT ALL PRIVILEGES ON DATABASE streamsmsgmgr TO streamsmsgmgr;" | sudo -u postgres psql -U postgres -h localhost -p 5433

echo "CREATE DATABASE druid;" | sudo -u postgres psql -U postgres
echo "CREATE DATABASE ranger;" | sudo -u postgres psql -U postgres
echo "CREATE DATABASE registry;" | sudo -u postgres psql -U postgres
echo "CREATE USER druid WITH PASSWORD 'druid';" | sudo -u postgres psql -U postgres
echo "CREATE USER registry WITH PASSWORD 'registry';" | sudo -u postgres psql -U postgres
echo "CREATE USER rangerdba WITH PASSWORD 'rangerdba';" | sudo -u postgres psql -U postgres
echo "CREATE USER rangeradmin WITH PASSWORD 'ranger';" | sudo -u postgres psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE druid TO druid;" | sudo -u postgres psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE registry TO registry;" | sudo -u postgres psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE ranger TO rangerdba;" | sudo -u postgres psql -U postgres
echo "GRANT ALL PRIVILEGES ON DATABASE ranger TO rangeradmin;" | sudo -u postgres psql -U postgres

#ambari-server setup --jdbc-db=postgres --jdbc-driver=/usr/share/java/postgresql-jdbc.jar

if [[ $(cat /etc/system-release|grep -Po Amazon) == "Amazon" ]]; then         
 echo '' >  /var/lib/pgsql/9.5/data/pg_hba.conf
 echo 'local all cloudbreak,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid,registry           trust  ' >> /var/lib/pgsql/9.5/data/pg_hba.conf
 echo 'host  all cloudbreak,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid,registry 0.0.0.0/0 trust  ' >> /var/lib/pgsql/9.5/data/pg_hba.conf
 echo 'host  all cloudbreak,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid,registry ::/0      trust  ' >> /var/lib/pgsql/9.5/data/pg_hba.conf
 echo 'local all             all                                         peer   ' >> /var/lib/pgsql/9.5/data/pg_hba.conf
 echo 'host  all             all             127.0.0.1/32                 ident  ' >> /var/lib/pgsql/9.5/data/pg_hba.conf
 echo 'host  all             all             ::1/128                      ident  ' >> /var/lib/pgsql/9.5/data/pg_hba.conf
 
 sudo -u postgres /usr/pgsql-9.5/bin/pg_ctl -D /var/lib/pgsql/9.5/data/ reload
else
 echo '' >  /var/lib/pgsql/data/pg_hba.conf
 echo 'local all cloudbreak,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid,registry           trust  ' >> /var/lib/pgsql/data/pg_hba.conf
 echo 'host  all cloudbreak,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid,registry 0.0.0.0/0 trust  ' >> /var/lib/pgsql/data/pg_hba.conf
 echo 'host  all cloudbreak,ambari,postgres,hive,ranger,rangerdba,rangeradmin,rangerlogger,druid,registry ::/0      trust  ' >> /var/lib/pgsql/data/pg_hba.conf
 echo 'local all             all                                          peer   ' >> /var/lib/pgsql/data/pg_hba.conf
 echo 'host  all             all             127.0.0.1/32                 ident  ' >> /var/lib/pgsql/data/pg_hba.conf
 echo 'host  all             all             ::1/128                      ident  ' >> /var/lib/pgsql/data/pg_hba.conf
 
 sudo -u postgres pg_ctl -D /var/lib/pgsql/data/ reload
fi


yum remove -y mysql57-community*
yum remove -y mysql56-server*
yum remove -y mysql-community*
rm -Rvf /var/lib/mysql

yum install -y epel-release
yum install -y libffi-devel.x86_64
ln -s /usr/lib64/libffi.so.6 /usr/lib64/libffi.so.5

yum install -y mysql-connector-java*
ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar

if [ $(cat /etc/system-release|grep -Po Amazon) == Amazon ]; then        
 yum install -y mysql56-server
 service mysqld start
else
 yum localinstall -y https://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
 yum install -y mysql-community-server
 systemctl start mysqld.service
fi
chkconfig --add mysqld
chkconfig mysqld on

ln -s /usr/share/java/mysql-connector-java.jar /usr/hdp/current/hive-client/lib/mysql-connector-java.jar 
ln -s /usr/share/java/mysql-connector-java.jar /usr/hdp/current/hive-server2-hive2/lib/mysql-connector-java.jar

mysql --execute="CREATE DATABASE druid DEFAULT CHARACTER SET utf8"
mysql --execute="CREATE DATABASE registry DEFAULT CHARACTER SET utf8"
mysql --execute="CREATE DATABASE streamline DEFAULT CHARACTER SET utf8"
mysql --execute="CREATE DATABASE streamsmsgmgr DEFAULT CHARACTER SET utf8"
mysql --execute="CREATE USER 'das'@'localhost' IDENTIFIED BY 'dasuser'"
mysql --execute="CREATE USER 'das'@'%' IDENTIFIED BY 'dasuser'"
mysql --execute="CREATE USER 'ranger'@'localhost' IDENTIFIED BY 'ranger'"
mysql --execute="CREATE USER 'ranger'@'%' IDENTIFIED BY 'ranger'"
mysql --execute="CREATE USER 'rangerdba'@'localhost' IDENTIFIED BY 'rangerdba'"
mysql --execute="CREATE USER 'rangerdba'@'%' IDENTIFIED BY 'rangerdba'"
mysql --execute="CREATE USER 'registry'@'localhost' IDENTIFIED BY 'registry'"
mysql --execute="CREATE USER 'registry'@'%' IDENTIFIED BY 'registry'"
mysql --execute="CREATE USER 'streamsmsgmgr'@'localhost' IDENTIFIED BY 'streamsmsgmgr'"
mysql --execute="CREATE USER 'streamsmsgmgr'@'%' IDENTIFIED BY 'streamsmsgmgr'"
mysql --execute="CREATE USER 'druid'@'%' IDENTIFIED BY 'druid'"
mysql --execute="CREATE USER 'streamline'@'%' IDENTIFIED BY 'streamline'"
mysql --execute="CREATE USER 'streamline'@'localhost' IDENTIFIED BY 'streamline'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'das'@'localhost'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'das'@'%'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'das'@'localhost' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'das'@'%' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'localhost'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'%'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'localhost' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'%' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'localhost'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'%'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'localhost' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'rangerdba'@'%' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'%' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'registry'@'localhost'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'registry'@'%'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'registry'@'localhost' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'registry'@'%' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'streamsmsgmgr'@'localhost'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'streamsmsgmgr'@'%'"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'streamsmsgmgr'@'localhost' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON *.* TO 'streamsmsgmgr'@'%' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON streamline.* TO 'streamline'@'%' WITH GRANT OPTION"

mysql --execute="CREATE DATABASE beast_mode_db DEFAULT CHARACTER SET utf8"
mysql --execute="CREATE USER 'bmq_user'@'localhost' IDENTIFIED BY 'Be@stM0de'"
mysql --execute="CREATE USER 'bmq_user'@'%' IDENTIFIED BY 'Be@stM0de'"
mysql --execute="GRANT ALL PRIVILEGES ON beast_mode_db.* TO 'bmq_user'@'localhost'"
mysql --execute="GRANT ALL PRIVILEGES ON beast_mode_db.* TO 'bmq_user'@'%'"
mysql --execute="GRANT ALL PRIVILEGES ON beast_mode_db.* TO 'bmq_user'@'localhost' WITH GRANT OPTION"
mysql --execute="GRANT ALL PRIVILEGES ON beast_mode_db.* TO 'bmq_user'@'%' WITH GRANT OPTION"
mysql --execute="FLUSH PRIVILEGES"
mysql --execute="COMMIT"


#remount tmpfs to ensure NOEXEC is disabled
if grep -Eq '^[^ ]+ /tmp [^ ]+ ([^ ]*,)?noexec[, ]' /proc/mounts; then
  echo "/tmp found as noexec, remounting..."
  mount -o remount,size=10G /tmp
  mount -o remount,exec /tmp
else
  echo "/tmp not found as noexec, skipping..."
fi

Pre Ambari start recipe to grow the root volume for the CDSW worker

#!/usr/bin/env bash

# WARNING: This script is only for RHEL7 on Azure

# growing the /dev/sda2 partition


sed -e 's/\s*\([\+0-9a-zA-Z]*\).*/\1/' << EOF | fdisk /dev/sda
  d # delete
  2 # delete partition 2
  n # new 
  p # partition
  2 # partition 2 
  # default
  # default
  w # write the partition table
  q # and we're done
EOF

reboot

Post cluster install recipe to setup CDSW

#!/usr/bin/env bash

# WARNING: This script is only for RHEL7 on Azure

# growing the /dev/sda2 partition


xfs_growfs /dev/sda2


# Some of these installs may be unecessary but are included for completeness against documentation
yum -y install nfs-utils libseccomp lvm2 bridge-utils libtool-ltdl ebtables rsync policycoreutils-python ntp bind-utils nmap-ncat openssl e2fsprogs redhat-lsb-core socat selinux-policy-base selinux-policy-targeted 

# CDSW wants a pristine IPTables setup
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X

# set java_home on centos7
#echo 'export JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::")' >> /etc/profile
#export JAVA_HOME=$(readlink -f /usr/bin/javac | sed "s:/bin/javac::")
echo 'export JAVA_HOME=/usr/lib/jvm/java' >> /etc/profile
export JAVA_HOME='/usr/lib/jvm/java'

# Fetch public IP
export MASTER_IP=$(hostname --ip-address)

# Fetch public FQDN for Domain
export DOMAIN=$(curl https://ipv4.icanhazip.com)



cd /hadoopfs/
mkdir cdsw

# Install CDSW
#wget -q --no-check-certificate https://s3.eu-west-2.amazonaws.com/whoville/v2/temp.blob
#mv temp.blob cloudera-data-science-workbench-1.5.0.818361-1.el7.centos.x86_64.rpm
wget -q https://archive.cloudera.com/cdsw1/1.5.0/redhat7/yum/RPMS/x86_64/cloudera-data-science-workbench-1.5...
yum install -y cloudera-data-science-workbench-1.5.0.849870-1.el7.centos.x86_64.rpm

# Install Anaconda
curl -Ok https://repo.anaconda.com/archive/Anaconda2-5.2.0-Linux-x86_64.sh
chmod +x ./Anaconda2-5.2.0-Linux-x86_64.sh
./Anaconda2-5.2.0-Linux-x86_64.sh -b -p /anaconda

# create unix user
useradd tutorial
echo "tutorial-password" | passwd --stdin tutorial

su - hdfs -c 'hdfs dfs -mkdir /user/tutorial'
su - hdfs -c 'hdfs dfs -chown tutorial:hdfs /user/tutorial'

# CDSW Setup
sed -i "s@MASTER_IP=\"\"@MASTER_IP=\"${MASTER_IP}\"@g" /etc/cdsw/config/cdsw.conf
sed -i "s@JAVA_HOME=\"/usr/java/default\"@JAVA_HOME=\"$(echo ${JAVA_HOME})\"@g" /etc/cdsw/config/cdsw.conf
sed -i "s@DOMAIN=\"cdsw.company.com\"@DOMAIN=\"${DOMAIN}.xip.io\"@g" /etc/cdsw/config/cdsw.conf
sed -i "s@DOCKER_BLOCK_DEVICES=\"\"@DOCKER_BLOCK_DEVICES=\"${DOCKER_BLOCK}\"@g" /etc/cdsw/config/cdsw.conf
sed -i "s@APPLICATION_BLOCK_DEVICE=\"\"@APPLICATION_BLOCK_DEVICE=\"${APP_BLOCK}\"@g" /etc/cdsw/config/cdsw.conf
sed -i "s@DISTRO=\"\"@DISTRO=\"HDP\"@g" /etc/cdsw/config/cdsw.conf
sed -i "s@ANACONDA_DIR=\"\"@ANACONDA_DIR=\"/anaconda/bin\"@g" /etc/cdsw/config/cdsw.conf

# CDSW will break default Amazon DNS on 127.0.0.1:53, so we use a different IP
sed -i "s@nameserver 127.0.0.1@nameserver 169.254.169.253@g" /etc/dhcp/dhclient-enter-hooks

cdsw init

echo "CDSW will shortly be available on ${DOMAIN}"


# after the init, we wait until we are able to create the tutorial user
export respCode=404

while (( $respCode != 201 ))

do
    sleep 10
 export respCode=$(curl -iX POST http://${DOMAIN}.xip.io/api/v1/users/ -H 'Content-Type: application/json' -d '{"email":"tutorial@tutorial.com","name":"tutorial","username":"tutorial","password":"tutorial-password","type":"user","admin":true}' | grep HTTP | awk '{print $2}')

done

exit 0

Note: this script is using xip.io and hacks into unix to create user and hadoop folders, not a recommendation in production!

Management packs

IMAGE

You will need two management packs for this setup, using the URL detailed below:

Step 3: Create cluster

This step uses Cloudbreak's Create Cluster wizard, and is pretty self-explanatory following screenshots, but I will add specific parameters in text form for convenience

Note: Do not forget to toggle the advanced mode when running the wizard (top of the screen)

General Configuration

108132-screen-shot-2019-04-23-at-33907-pm.png

Image Settings

108155-screen-shot-2019-04-23-at-33934-pm.png

Hardware and Storage

108165-screen-shot-2019-04-23-at-34824-pm.png

Note: Make sure to use 100 GB as the root volume size for CDSW.

Network and Availability

108116-screen-shot-2019-04-23-at-35004-pm.png

Cloud Storage

108191-screen-shot-2019-04-23-at-35035-pm.png

Cluster Extensions

108264-screen-shot-2019-04-26-at-123127-pm.png

External Sources

108166-screen-shot-2019-04-23-at-35533-pm.png

Gateway Configuration

108192-screen-shot-2019-04-23-at-35600-pm.png

Network Security Groups

108201-screen-shot-2019-04-23-at-35645-pm.png

Security

108133-screen-shot-2019-04-23-at-40512-pm.png

Result

After the cluster created, you should have access to the following screen in Cloudbreak:

108148-screen-shot-2019-04-23-at-52202-pm.png

You can now access Ambari via the link provided, and CDSW using http://[CDSW_WORKER_PUBLIC_IP].xip.io


screen-shot-2019-04-23-at-35446-pm.pngscreen-shot-2019-04-23-at-34443-pm.pngscreen-shot-2019-04-23-at-35855-pm.png
1,693 Views