Community Articles

knarayanan · ‎08-16-2018

Was trying to dig up some TSP benchmark info for nifi listenhttp, which allows for providing a rest proxy, with no luck, so i tried to create one myself. This is a very rough effort, i can improve it by capturing how many client instances i have running , so we can see the TPS drop and rise as the clients drop and rise. At some point i ran out of servers and resources to run more clients, you will the chart drops off at the end , before going back up. This is where i noticed some of my clients had crashed and i restarted them.

Anyway, getting to the matter.

Purpose

The benchmark only meassure how much load can a ListenHTTP processor handle , when subjected to real world traffic.

Setup

The nifi cluster is setup on an m4.4xlarge ( 16 cores CPU, 32 GB RAM), The node is also hosting the kafka broker and zookeeper. HDF version is 3.1.1

The NiFi is a simple Listenhttp processor forwarding to updateattribute. updateattribute burns the flowfile. The idea was to only measure Listenhttp performance for receiving a message, create flowfile, respond to client and forward the message to next processor. The benchmark tries to measure what kind of peak TPS could be achieved.

The NiFi instance is running a S2S provenance task, which forwards provenance event to another nifi instance, which further forwards it to a kafka topic. The data is then ingested into Druid using kafka ingestion. timestampmillis column of the provenancce event will be used by druid for indexing.

For the client piece i have a simple python script that constantly calls the rest service exposed by listenhttp, passing the below json. The timestamp in the json is just to ensure the messages are different.

{“key”:”clien1”,”timestamp”:<current_unix_time>}.

The python is a simple infinite loop in the below format.

import requests
import time
import random
from multiprocessing import Process
import os
import json
import threading
from time import sleep
def call_rest():
    values=["client1"]
    value = random.choice(values)
    start = time.time()
    timestamp = round(time.time()*1000)
    r = requests.post('http://nifi1.field.hortonworks.com:19192/test',data = json.dumps({"key":value,"timestamp":timestamp}))
while True:
   threads = []
   for i in range(5):
      t = threading.Thread(target=call_rest)
      threads.append(t)
      t.start()

I ran 5 instances of the script across 8 servers to help me generate the kind of volume i needed for this test.

Dashboard

Once the data is in druid, i can utilize superset to chart and aggregate the provenance events at an interval of one second. Since the provenance events can take a few minutes to arrive, i used a one minute window from 5 minutes ago, meaning from t-5 to t-4 timestamps. This what i saw on the chart, I also filterd by query to only look for componentType=Listenhttp and eventType=RECEIVE.

From the above chart we can see that the rate fluctuates from a max of 3000 TPS max to around 600 TPS minimum.

To get a better aggregation or a even aggregation, i aggregated this over 5 minute interval over an hour to see what we are doing on average...The chart was pretty promising.

So on an average we are looking at 300k messages per 5 minutes, which is around 1000 TPS.

Conclusion

The 1000 TPS we se see from NiFi from this above load test, is not probably what the max load it can handle, i can try and run my tasks on more severs and see if we see higher numbers. But, at 1000 TPS , NiFi should be able to handle most web based traffic. Additionaly this is on a clusert with one node of NiFi, we can linearly scale by adding more nodes to the cluster .

Cloudera Community

Community Articles

NiFi Listenhttp Benchmark

Apache NiFi