Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Guru

93605-2018-11-15-13-58-32.jpg

Objective of this article is to demonstrate how to rapidly deploy a demo Druid & LLAP cluster preloaded with 20 years (nearly 113 million records) of airline data ready for analytics using CloudBreak on any IaaS. Entire deployment is UI driven without the need for a large overhead of administration. All artifacts mentioned in this article are publicly available for reuse to try on your own



Prolegomenon

Time series is an incredible capability highly leveraged within the IoT space. Current solution sets offer non scalable & expensive or distributed processing engines lacking low latency OLAP speeds. Druid is an OLAP time series engine backed by a Lambda architecture. Druid out of the box SQL capabilities are severely limited and without join support. Layering HiveQL over Druid brings the best of both worlds. Hive 3 also offers HiveQL over Kafka essentially making Hive a true SQL federation engine. With Druid’s native integration with Kafka, streaming data from Kafka directly into Druid while executing real time SQL queries via HiveQL offers a comprehensive Time Series solution for the IoT space.

On with the demo....

To begin the demonstration, launch a CloudBreak deployer instance on any IaaS or on prem VM. Quick start makes this super simple. Launching CloudBreak deployer is well documented here.

Once the CloudBreak deployer is up, add your Azure, AWS, GCP, or OpenStack credentials within the CloudBreak UI. This will allow deployment of the same cluster on any IaaS.

Druid Blue Print

  • To launch a Druid/LLAP cluster, an Ambari blue print will be required. Click On Blueprints

93606-blueprints.jpg

  • Click on CREATE BLUEPRINT

93607-create-blueprint.jpg

  • Name the blue print and enter the following url to import it into CloudBreak
https://s3-us-west-2.amazonaws.com/sunileman1/ambari-blueprints/hdp3/druid+llap+Ambari+Blueprint

93608-blueprint-url.jpg

Recipes

  • Druid Requires a MetaStore. Import the recipe to create the MetaStore into CloudBreak
  • Under Cluster Extensions, click on Recipes

93610-cluster-extensions-recipe.jpg

  • Enter a name for the recipe and select "pre-ambari-start" to run this recipe prior to Ambari starting
  • Under URL enter the following to import the recipe into CloudBreak
https://s3-us-west-2.amazonaws.com/sunileman1/scripts/druid+metastore+install.sh

93609-druid-metastore-install-script.jpg

  • This cluster will come preloaded with 20 years of airline data
  • Enter a name for recipe and select "post-cluster-install to run this recipe once HPD services are up
  • Under URL enter the following to import the recipe into CloudBreak
https://s3-us-west-2.amazonaws.com/sunileman1/scripts/airline-data.sh

93625-cs12.jpg

Create a Cluster

  • Now that all recipes are in place, next step is to create a cluster

93612-create-cluster.jpg

  • Select a IaaS to deploy on (credential)
  • Enter Cluster Name
  • Select HDP 3.0
  • Select cluster type: Druid LLAP HPD 3
    • This is the new blue print which was imported in previous steps

93613-cs1.jpg

  • Select an image. Base image will do for most deployments

93614-cs2.jpg

  • Select instance types
    • Note - I used 64 GB of ram per node. Additionally, I added 3 compute nodes.

93784-cb20.jpg

  • Select network to deploy the cluster on. If one is not pre-created, CloudBreak will create one

93616-cs4.jpg

  • CloudBreak can be configured to use S3, ADLS, WASB, GCS
    • Configuring CloudBreak for S3 here
    • Configuring CloudBreak for ADLS here
    • Configuring CloudBreak for WASB here
    • Configuring CloudBreak for GCS

93617-cs5.jpg

  • On the Worker node attach get-airline-data recipe
  • On the Druid Broker node attach druid-metastore-install

93628-cs14.jpg

  • External MetaStores (databases) can be bootstrapped to the cluster. This demo does not require it.

93619-cs7.jpg

  • Knox will not be used for this demo

93620-cs8.jpg

  • Attach a security group to each host group. If a SG is not pre-created, CloudBreak will create one

93621-cs9.jpg

  • Lastly, provide an Ambari password and ssh key

93622-cs10.jpg

  • Cluster deployment and provision will begun. Within the next few minutes a cluster will be ready and an Ambari URL will be available.

93623-cs11.jpg

  • Zeppelin NoteBook can be imported using the url below.
https://s3-us-west-2.amazonaws.com/sunileman1/Zeppelin-NoteBooks/airline_druid.json

93627-cs13.jpg

Here I demonstrated how to rapidly launch a Druid/LLAP cluster preloaded with airline data using CloudBreak. Enjoy Druid, it's crazy fast. HiveQL makes Druid easy to work with. CloudBreak makes the deployment super quick.


get-airline-data.jpg93783cs18.jpgcs6.jpgcs18.jpgcs3.jpg
2,788 Views
Comments

@sunile_manjee  Hey, thanks for the detailed documentation. This cloudbreak-default image has old version of Druid Controller Console. I was wondering if there is latest image which has latest version of Druid installed. 

 

Thanks

Kalyan

@Kalyan77  Good question and I haven't tried yet.  in the next few weeks I have an engagement which will require me to find out. will keep you posted.