Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

avatar

What are best practices for Deploying Storm Components on a cluster for scalablity and growth? We are thinking of having dedicated nodes for Storm on YARN. Also would anything go on an edge node?

For example in a cluster, the thought is to have three Storm nodes (S1, S2, S3) dedicated with the following allocations:

Storm Nimbus:

  • Choose S1 as Storm Master to deploy Storm Nimbus... Or Probably a best practice is to not co-locate Nimbus with any worker node
  • S1 node will also have the Storm UI

Storm Supervisors/ Workers

  • Choose S1, S2, S3 to deploy Storm Supervisors

Zookeeper Cluster

  • Since Kafka is usually used with Storm, have a separate Zookeeper cluster for Kafka and Storm.
  • DON"T put the Zookeeper cluster on the Kafka nodes (K1, K2, K3).
  • Put the Zookeeper on the Storm nodes (S1, S2, S3)

Storm UI

  • Will be on the same node as the Nimbus: S1 or Edge

DRPC Server

  • What is the best practice to place this?

So in Summary, if we have three dedicated nodes for Storm, the thinking is to allocate as follows:

S1 Node:

  1. Storm Nimbus/ Storm UI (Maybe it is not a best practice to put Storm Nimbus on worker nodes and put this on Edge node?)
  2. Storm Supervisor
  3. Zookeeper

S2 Node:

  1. Storm Supervisor
  2. Zookeeper

S3 Node:

  1. Storm Supervisor
  2. Zookeeper

Edge Node:

  1. Storm Nimbus/ Storm UI (Maybe it is not a best practice to put Storm Nimbus on worker nodes and put this on Edge node?)

Finally would the DRPC go on the Nimbus node? Any thoughts on this? Am I on the right track? Would anything go on an edge node?

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Ancil McBarnett my 2 cents:

  • Nothing on Edge nodes, you have no idea what the guys will do there
  • Nimbus, Storm UI and DRPC on one of cluster master nodes. If this is a stand-alone Storm&Kafka cluster then set a master and put these guys together with Ambari there.
  • Supervisors on dedicated nodes. In hdfs cluster you can collocate them with Data nodes.
  • Dedicated Kafka broker nodes, but see below
  • Dedicated ZK for Kafka, however in case of Kafka-0.8.2 or higher, if consumers don't keep offsets on ZK, low to medium traffic and you have at least 3 brokers,then you can start with collocating Kafka ZK with brokers. In this case ZK should use dedicated disk.

View solution in original post

13 REPLIES 13

avatar
Master Mentor

@Ancil McBarnett accept best answer 🙂

avatar

This is the picture I have come up with

1823-screen-shot-2016-02-04-at-61845-pm.png

1824-screen-shot-2016-02-04-at-61856-pm.png

avatar
Master Guru

Cute pic! And everything looks good, however, even in the "Light Storm, Kafka" case I'd create a dedicated Kafka ZooKeeper and put it on Kafka brokers, so that ZKs write to their own disks (for example, 5 disks for Kafka, 1 for ZK).

avatar
New Member

Hi, I am planning to create Ambari Hadoop Storm Cluster and as this is fresh new for me I have some doubts how to setup it on the best way. Here is what I have for resources:

- Platform: AWS (8 EC2 instances - 1 master. 4 slaves, 3 workers (zookeepers))

- Tool: As I want to automate setup, I will use Terraform, Ansible and Blueprint to setup all environment

- I research a little bit and create some conclusion and I need some advice/opinion is this a good path???

Thanks

MASTER SLAVE ZOO
NAMENODE SECONDARY_NAMENODE DATANODE
NIMBUS RESOURCE_MANAGER NODEMANAGER
DRPC_SERVER SUPERVISOR ZOOKEEPER_SERVER
STORM_UI_SERVER ZOOKEEPER_CLIENT METRICS_MONITOR
ZOOKEEPER_CLIENT METRICS_MONITOR MAPREDUCE2_CLIENT
HDFS_CLIENT HDFS_CLIENT HDFS_CLIENT
PIG PIG PIG
TEZ_CLIENT TEZ_CLIENT TEZ_CLIENT
YARN_CLIENT YARN_CLIENT YARN_CLIENT
METRICS_COLLECTOR HISTORY_SERVER
METRICS_GRAFANA MAPREDUCE2_CLIENT
APP_TIMELINE_SERVER
HIVE_SERVER HCAT
HIVE_METASTORE WEBHCAT_SERVER
MYSQL_SERVER HIVE_CLIENT