Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)

One task everybody faces when setting up a new Hadoop cluster, is the allocation of services. Administrators of a cluster on the other hand might ask themselves, how are my services allocated? I have discussed the visualization of HDP clusters and services more often recently and therefore decided to share my application to visualize the current and future state of a cluster. (see link to hosted app at the end of the article)


What does Service Allocation mean?

Planning a Hadoop cluster involves many steps and tasks that need to be considered. Almost no setup is the same (although there are some similarities). The service allocation is the part that basically tells you what services and what components will be on which node and how many nodes you have or need. This can be quiet tedious and difficult, since not all services play along well with each other, every service has different hardware/setup requirements and adding many services can get confusing. This makes it even more important to have a sound overview of your service allocation.

To plan, document and visualize the service allocation or complete Hadoop cluster I have used paper sketches, Excel sheets, text files, Powerpoints, Photoshop and other tools. However these approaches are often time consuming, hard to edit/re-use and in general not the best option. In need for a proper tool I have created this rather small and simple Angular application (at least it was at the beginning), which basically visualizes a cluster by using a simple JSON document as input source (see below).

There are three ways to create a cluster visualization:

  • Export a live cluster via Ambari's API
  • Create a cluster by writing a JSON document as seen below
  • Build a new cluster with the latest drag-n-drop build feature

Lets say we have a cluster with: 2 Masternodes, 1 Datanode and a couple of different services.

The cluster is defined as:

{ "stack_version":"HDP-2.2", "security_type":"KERBEROS", "name":"bigdata", "hosts_info":[ { "host_name":"", "components":[ "NAMENODE", "RESOURCEMANAGER", "APP_TIMELINE_SERVER", "HISTORYSERVER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] }, { "host_name":"", "components":[ "SECONDARY_NAMENODE", "HIVE_METASTORE", "HIVE_SERVER", "HCAT", "WEBHCAT_SERVER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] }, { "host_name":"", "components":[ "DATANODE", "NODEMANAGER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] } ] } 

As soon as as the cluster is imported, you can choose between three views:


Design flexibility through Environments

Environments are basically exportable stack templates that contain information about available services and components as well as their groups and colors. In order to customize the visualization configuration (colors, sorting,...), you can edit the services and components within the application or the exported Environment (JSON). This makes it possible to use different output formats for specific clusters, departments, companies and so on by simply importing the environment when a cluster is imported.

Why you might find this app useful:

  • Planning new cluster
  • Easy Ambari Blueprint generation
  • Visualize cluster for concept or documentation
  • Quick overview of a cluster (e.g. for support, sysadmins, ...)
  • Consistent visualization/documentation
  • ...

If more people are interested in this project, I will add new features. For example:

  • Filter by node groups (type of node or service or any custom group)
  • Group nodes (Master, Worker, Edge, ...)
  • Implement as Ambari View (?)
  • ...

I hope some might find this tool useful. Looking forward to your feedback 🙂

You can find more screenshots here:

Project & Setup:

The above article mainly focused on version 0.3, since then a new version has been released with exciting new features. Read more in the next section below

Export, Build, Visualize and Deploy - What's new in v0.4

Since the above article has been publised in October, a lot of changes have been made and the web application has been heavily extended. In this short paragraph, I want to touch rather quickly on the latest enhancements, more details will follow in an additional article.

Whats New?

  • The nodes and its services/components have been completely redesigned/restructured
  • Added an option to switch between fullnames and acronyms (e.g. Namenode and NN)
  • New data structure for nodes. Nodes can have multiple hostnames now; this is a major change, since it reduces the data footprint immensly and allows the creation of simpler cluster templates
  • Build a Cluster! - A drag-n-drop based user interface to build a cluster
  • Blueprint Generator ! - Generate Ambari Blueprints directly from imported or built clusters

Build a Cluster - New -

This is definitely one of my favorite features. Instead of writing JSON templates, to plan and visualize a cluster, or exporting an existing cluster (although this is the easiest way) it is now possible to build a new cluster by using drag-n-drop. The tool supports up to 1000 Nodes, dynamic hostnames, HDFS & Yarn HA, ....

Blueprints (Beta) - New -

Generate Ambari Blueprints directly from imported or built clusters. General and Hostgroup-specific configurations can be added manually. More than one thousand suggested configuration parameters and categories.

Read more in this article about Blueprints and "Build a Cluster"

Master Mentor
@Jonas Straub

Very nice! It can be an offical blog


Thanks! Sure we could post it in our blog.

Rising Star

Very cool @Jonas Straub