Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Solved Go to solution

Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

What are best practices for Deploying Storm Components on a cluster for scalablity and growth? We are thinking of having dedicated nodes for Storm on YARN. Also would anything go on an edge node?

For example in a cluster, the thought is to have three Storm nodes (S1, S2, S3) dedicated with the following allocations:

Storm Nimbus:

  • Choose S1 as Storm Master to deploy Storm Nimbus... Or Probably a best practice is to not co-locate Nimbus with any worker node
  • S1 node will also have the Storm UI

Storm Supervisors/ Workers

  • Choose S1, S2, S3 to deploy Storm Supervisors

Zookeeper Cluster

  • Since Kafka is usually used with Storm, have a separate Zookeeper cluster for Kafka and Storm.
  • DON"T put the Zookeeper cluster on the Kafka nodes (K1, K2, K3).
  • Put the Zookeeper on the Storm nodes (S1, S2, S3)

Storm UI

  • Will be on the same node as the Nimbus: S1 or Edge

DRPC Server

  • What is the best practice to place this?

So in Summary, if we have three dedicated nodes for Storm, the thinking is to allocate as follows:

S1 Node:

  1. Storm Nimbus/ Storm UI (Maybe it is not a best practice to put Storm Nimbus on worker nodes and put this on Edge node?)
  2. Storm Supervisor
  3. Zookeeper

S2 Node:

  1. Storm Supervisor
  2. Zookeeper

S3 Node:

  1. Storm Supervisor
  2. Zookeeper

Edge Node:

  1. Storm Nimbus/ Storm UI (Maybe it is not a best practice to put Storm Nimbus on worker nodes and put this on Edge node?)

Finally would the DRPC go on the Nimbus node? Any thoughts on this? Am I on the right track? Would anything go on an edge node?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Hi @Ancil McBarnett my 2 cents:

  • Nothing on Edge nodes, you have no idea what the guys will do there
  • Nimbus, Storm UI and DRPC on one of cluster master nodes. If this is a stand-alone Storm&Kafka cluster then set a master and put these guys together with Ambari there.
  • Supervisors on dedicated nodes. In hdfs cluster you can collocate them with Data nodes.
  • Dedicated Kafka broker nodes, but see below
  • Dedicated ZK for Kafka, however in case of Kafka-0.8.2 or higher, if consumers don't keep offsets on ZK, low to medium traffic and you have at least 3 brokers,then you can start with collocating Kafka ZK with brokers. In this case ZK should use dedicated disk.
13 REPLIES 13

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Mentor

storm supervisors on their own nodes, kafka brokers can be collocated with datanodes, that's my findings from our recent POC. I can give you more detail on phone. @Ancil McBarnett

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Mentor

@Ancil McBarnett Have you looked at this guide?

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Yes, but it does not go into deployment from a cluster topology point of view (except for the discussion on zookeeper) @Wes Floyd

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Hi @Ancil McBarnett my 2 cents:

  • Nothing on Edge nodes, you have no idea what the guys will do there
  • Nimbus, Storm UI and DRPC on one of cluster master nodes. If this is a stand-alone Storm&Kafka cluster then set a master and put these guys together with Ambari there.
  • Supervisors on dedicated nodes. In hdfs cluster you can collocate them with Data nodes.
  • Dedicated Kafka broker nodes, but see below
  • Dedicated ZK for Kafka, however in case of Kafka-0.8.2 or higher, if consumers don't keep offsets on ZK, low to medium traffic and you have at least 3 brokers,then you can start with collocating Kafka ZK with brokers. In this case ZK should use dedicated disk.

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

@Predrag Minovic why not Supervisors on their own nodes, not on Data nodes?

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

Mentor

@Ancil McBarnett @tgoetz suggests to put supervisors on their own.

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

@Ancil McBarnett Oh yes, supervisors definitely on dedicated nodes if you have enouhg nodes. I updated my answer.

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

As per

https://community.hortonworks.com/content/kbentry/550/unofficial-storm-and-kafka-best-practices-guid...

ZK on separate nodes from Kafka Broker. Do Not Install zk nodes on the same node as kafka broker if you want optimal Kafka performance. Disk I/O both kafka and zk are disk I/O intensive

Re: Best Practices for Storm Deployment on a Hadoop Cluster using Ambari. How would you allocate components in production?

This is not a complete answer, but would like to also add that, by default, Kafka brokers write to local storage (not HDFS), and therefore, benefit from fast local disk (SSD) and/or multiple spindles to parallelize writes to partitions. I don't know of a formula to calculate this, but try to maximize I/O throughput to disk, and allocate # spindles up to the # of available CPUs per node. Lots of Hadoop architectures don't really specify allocation for local storage (beyond OS disk), and therefore it is something to be aware of.

Don't have an account?
Coming from Hortonworks? Activate your account here