Community Articles

roshan1 · ‎07-07-2016

The release of version 1.0 marks another major milestone for Storm. Since becoming an Apache project in Sept 2013, much work has gone into maturing the feature set and also improving performance by reworking or tweaking various components. Some of the notable changes that contribute to improved performance are:

Switch from ZeroMQ to Netty for inter-worker messaging.
Employing batching in Disruptor queues (used for intra-worker messaging)
Optimizations in the clojure code such as employing type hints and reducing expensive clojure lookups in perf sensitive areas.

In this blog we shall take a look at performance improvements in Storm since its incubation into Apache. To quantify this, we shall compare the performance numbers of Storm v.0.9.0.1, which was the last pre-apache release, with the most recent Storm v1.0.1. Storm v.0.9.0.1 has also been used as a reference point for performance comparisons against Heron.

Given the existence of recent efforts to benchmark Storm “at scale”, here we shall examine performance from a different angle. We narrow the focus to some specific core areas of Storm using a collection of simple topologies. To contain the scope, we have limited the scope to Storm core (i.e. no Trident).

Methodology

Each topology was given at least 4 mins of “warm up” execution time before taking measurements. Subsequently, after a minimum of 10 minutes, metrics were captured from the Web UI for the last 10-minute window. The captured numbers have been rounded off for readability. In all cases ACK-ing was enabled with 1 ACKer bolt executor. Throughput (i.e tuples/sec) was calculated by dividing the total ACKs for the 10 min window by 600.

Due to some backward incompatibilities (mostly namespace changes) in Storm, two versions of the topologies were written, one for each Storm version. As a general principle we have avoided configuration tweaks to tune performance and stayed with default values. The only config setting we applied was to set the max heap size of the worker to 8GB to ensure memory.

Setup:

Hardware:

5-node cluster (1 nimbus and 4 supervisor nodes) running Storm v0.9.0.1
5-node cluster (1 nimbus and 4 supervisor nodes) running Storm v1.0.1
3-node Zookeeper cluster

Hardware:

All nodes had the following configuration:

CPU: sockets = 2 sockets, 6 cores per socket, Hyper threaded. Model (2sockets x 6cores x 2 hyper threads = 24). Intel Xeon CPU E5-2630 0 @ 2.30GHz
Memory : 126 GB
Network: 10GigE
Disk: 6 disks. Each 1TB. 7200 RPM.

Measurements:

1- Spout Emit Speed

Here we measure how fast a single spout can emit tuples.

Topology: This is the simplest topology. It consists of a ConstSpout that repeatedly emits the string “some data” and no bolts. Spout parallelism is set to 1, so there is only one instance of the spout executing. Here we measure the number of emits per second. Latency is not relevant as there are no bolts.

Topology Code:

v0.9.0.1: https://github.com/hortonworks/storm/blob/perftopos0.9.0.1/src/main/java/perftopos/topology/ConstSpo...

v1.0.1: https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/sto...

Measurements:

v0.9.0.1:

Emit rate: 108 k tuples/sec

v1.0.1:

Emit rate: 3.2 million tuples/sec

2- Messaging Speed (Intra-worker):

The goal is to measure the speed at which tuples can be transferred between a spout and a bolt running within the same worker process.

Topology: Consists of a ConstSpout that repeatedly emits the string “some data” and a DevNull bolt which ACKs every incoming tuple and discards them. The spout, bolt and acker were given 1 executor each. The spout and bolt were both run within the same worker.

Topology Code:

v0.9.0.1: https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/sto...

v1.0.1: https://github.com/hortonworks/storm/blob/perftopos0.9.0.1/src/main/java/perftopos/topology/ConstSpo...

Measurements:

v0.9.0.1:

Throughput: 87k/sec Latency: 16ms

v1.0.1:

Throughput: 233k/sec Latency: 3.4ms

3- Messaging Speed (Inter-worker 1):

The goal is to measure the speed at which tuples are transferred between a spout and a bolt, when both are running on two separate worker processes on the same machine.

Topology: Same topology as the one used for Intra-worker messaging speed. The spout and bolt were however run on two separate workers on the same host. The bolt and the acker were observed to be running on the same worker.

Topology Code:

v0.9.0.1: https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/sto...

v1.0.1: https://github.com/hortonworks/storm/blob/perftopos0.9.0.1/src/main/java/perftopos/topology/ConstSpo...

Measurements:

v0.9.0.1:

Throughput: 48 k/sec Latency: 170 ms

v1.0.1:

Throughput: 287 k/sec Latency: 8 ms

4- Messaging Speed (Inter-worker 2):

The goal is to measure the speed at which tuples are transferred when the spout, bolt and acker are all running on separate worker processes on the same machine.

Topology: Same topology as the one used for Intra-worker messaging speed. The spout, bolt and acker were however run on three separate workers on the same host.

Topology Code:

v0.9.0.1: https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/sto...

v1.0.1: https://github.com/hortonworks/storm/blob/perftopos0.9.0.1/src/main/java/perftopos/topology/ConstSpo...

Measurements:

v0.9.0.1:

Throughput: 43 k/sec Latency: 116 ms

v1.0.1:

Throughput: 292 k/sec Latency: 8.6 ms

5- Messaging Speed (Inter-host 1):

The goal is to measure the speed at which tuples are transferred between a spout and a bolt, when both are running on two separate worker processes running on (2) different machines.

Topology: Same topology as the one used for Intra-worker messaging speed but the spout and bolt were run on two separate workers on two different hosts. The bolt and the acker were observed to be running on the same worker.

Topology Code:

v0.9.0.1 : https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/sto...

v1.0.1: https://github.com/hortonworks/storm/blob/perftopos0.9.0.1/src/main/java/perftopos/topology/ConstSpo...

Measurements:

v0.9.0.1:

Throughput: 48 k/sec Latency: 845 ms

v1.0.1:

Throughput: 316 k/sec Latency: 13.3 ms

6- Messaging Speed (Inter-host 2):

Here we measure the speed at which tuples are transferred when the spout, bolt and acker are all running on separate worker processes on 3 different machines.

Topology: Again same topology as inter-host 1, but this time the acker run on a separate host.

Topology Code:

v0.9.0.1: https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/sto...

v1.0.1: https://github.com/hortonworks/storm/blob/perftopos0.9.0.1/src/main/java/perftopos/topology/ConstSpo...

Measurements:

v0.9.0.1:

Throughput: 50 k/sec Latency: 1700 ms

v1.0.1:

Throughput: 303 k/sec Latency: 7.4 ms

Summary

Throughput (tuples/sec)

Storm version	Spout Emit	MSGing IntraWorker	MSGing InterWorker 1	MSGing InterWorker 2	MSGing InterHost 1	MSGing InterHost 2
v0.9.0.1	108,000	87,000	48,000	43,000	48,000	50,000
v1.0.1	3,200,000	233,000	287,000	292,000	316,000	303,000

Latency (milliseconds)

Storm ver	MSGing IntraWorker	MSGing InterWorker 1	MSGing InterWorker 2	MSGing InterHost 1	MSGing InterHost 2
v1.0.1	3	8	9	13	7
v0.9.0.1	16	170	116	845	1,700

Cloudera Community

Community Articles

Microbenchmarking Storm 1.0 Performance

Apache Storm

Methodology

Setup:

Measurements:

1- Spout Emit Speed

2- Messaging Speed (Intra-worker):

3- Messaging Speed (Inter-worker 1):

4- Messaging Speed (Inter-worker 2):

5- Messaging Speed (Inter-host 1):

6- Messaging Speed (Inter-host 2):

Summary

Throughput (tuples/sec)

Latency (milliseconds)