Created on 04-18-2016 06:47 PM - edited 08-17-2019 12:42 PM
This part 1 article is an introduction to Burrow’s design, an overview of the features meant to address some of the known Kafka’s monitoring challenges, a step-by-step tutorial to setup Burrow on sandbox. The part 2 article will explore Consumer lag evaluation rules, HTTP endpoint APIs, email and HTTP notifiers. Stay tuned for part 2 article to be released on May.
The source reference is Burrow’s documentation wiki: https://github.com/linkedin/Burrow/wiki
Collecting metrics can help troubleshooting and set alerts on those that require an action.
Process state, memory usage, swap usage, network bandwidth, disk usage, disk IO, under-replicated partitions, offline partitions, active controller brokers, incoming messages per second, incoming/outgoing bytes per second, requests per second, total and split time to process a request, disputed leader elections rate, asynchronous disk log flush and time, unclean leader election rate, number of partitions, ISR shrink/expansion rate, network processor average idle %, messages by which the consumer lags behind the producer (MaxLag), minimum rate at which consumer sends requests to the Kafka broker, messages consumed per second per consumer topic, bytes consumed per second, rate at which consumer commits offsets to Kafka, partitions owned by a consumer.
The standard Kafka consumer has a built-in metric to track MaxLag, however, it has several flaws:
Burrow is under active development by the Data Infrastructure Streaming SRE team at LinkedIn, is written in Go, published under the Apache License, and hosted on www.gituhub.com/linkedin/Burrow.
Burrow's high level design is presented in the diagram below.
Burrow automatically monitors all consumers and every partition that they consume. It does it by consuming the special internal Kafka topic to which consumer offsets are written. Burrow then provides consumer information as a centralized service that is separate from any single consumer. Consumer status is determined by evaluating the consumer's behavior over a sliding window. For each partition, data is recorded to answer the following questions:
The information is distilled down into a status for each partition, and then into a single status for the consumer. A consumer is either OK, in a WARNING state (the consumer is working but falling behind), or in an ERROR state (the consumer has stopped or stalled). This status is available through a simple HTTP request to Burrow, or it can be periodically checked and sent out via email or to a separate HTTP endpoint (such as a monitoring or notification system). The HTTP request endpoints for getting information about the Kafka cluster and consumers, separate from the lag status, are very useful for applications that assist with managing Kafka clusters when it is not possible to run a Java Kafka client.
For example, if we have configured Burrow with a Kafka cluster named local which has a consumer group named kafkamirror_aggregate, a simple HTTP GET request to Burrow using the path/v2/kafka/local/consumer/kafkamirror_aggregate/status can show us that the consumer is working correctly:
{"error":false,"message":"consumer group status returned","status":{"cluster":"local","group":"kafkamirror_aggregate","status":"OK","complete":true,"partitions":[ ]}}
It can also show us when the consumer is not working correctly, and specifically which topics and partitions are having problems:
{"error":false,"message":"consumer group status returned","status":{"cluster":"local","group":"kafkamirror_aggregate","status":"WARN","complete":true,"partitions":[{"topic":"very_busy_topic","status":"WARN","partition":1,"start":{"timestamp":1433033218951,"lag":248314281,"offset":303081219},"end":{"timestamp":1433033758950,"lag":251163129,"offset":3035669403}}]}}
Burrow was created to address problems at LinkedIn with standard Kafka Consumer Monitor, in particular wildcard consumers like mirror makers and audit consumers. Instead of checking offsets for specific consumers periodically, it monitors the stream of all committed offsets and continually calculates lag over a sliding window.
Burrow is currently limited to monitoring consumers that are using Kafka-committed offsets. This method (new in Apache Kafka 0.8.2) replaces the previous method of committing offsets to Zookeeper.
For the purpose of this demo, I have used HDP 2.4 sandbox which runs on CentOS release 6.7, and I used sudo to execute the following commands.
1. Install and set up Go.
Create /tmp and download latest Go version for Linux 64-bit.
$ cd /tmp$ wget https://storage.googleapis.com/golang/go1.6.linux-amd64.tar.gz
Extract the binary files to /usr/local/go.
$ tar -C /usr/local -xzf /tmp/go1.6.linux-amd64.tar.gz
For easy access, symlink your installed binaries in /usr/local/go to /usr/local/bin, which should be in your default $PATH in your shell.
$ ln -s /usr/local/go/bin/go /usr/local/bin/go$ ln -s /usr/local/go/bin/godoc /usr/local/bin/godoc$ ln -s /usr/local/go/bin/gofmt /usr/local/bin/gofmt$ export GOROOT=/usr/local/go
Add the GOROOT/bin directory to your $PATH. Add the following line to your ~/.profile file.
$ export PATH=$PATH:$GOROOT/bin
You now have the working go binary for version 1.6 (at the time of this article, 1.6 is the latest version).
$ go version go version go1.6 linux/amd64
Create a workspace for Go projects
$ mkdir /workspace$ $ cd /workspace$ $ mkdir go$ $ export GOPATH=/workspace/go $ cd go $ mkdir src $ cd src $ mkdir github.com $ cd github.com $ mkdir linkedin$ $ cd linkedin
2. Install the latest Go Package Manager version. GPM is used to automatically pull in the dependencies for Burrow so you don't have to deal with that complexity.
$ cd /tmp$ wget https://raw.githubusercontent.com/pote/gpm/v1.4.0/bin/gpm && chmod +x gpm && sudo mv gpm /usr/local/bin
3. Install git client:
$ yum install git$ git --versiongit version 1.7.1
Note: installing git client will prove useful at step 5
4. Clone Burrow repository:
$ cd $GOPATH/src/$ go get github.com/linkedin/burrow
5. Build and install Burrow:
$ export GOBIN=$GOPATH/bin$ cd /workspace/go/src/github.com/linkedin/burrow$ gpm install
Because of a change made 8-month ago, it seems that gpm is unable to bring all the dependencies; some packages were migrated to a different repository gopkg.in/gcfg.v1. The workaround is to clone the missing package from Github:
$ cd $GOPATH/src/gopkg.in $ git clone https://github.com/go-gcfg/gcfg.git
Then:
$ cd $GOPATH/src/github.com/linkedin/burrow $ mv $GOPATH/src/gopkg.in/gcfg/ $GOPATH/src/gopkg.in/gcfg.v1 $ go install
Finally, the executable, can be found in $GOBIN
6. Run Burrow:
Go to $GOPATH/src/github.com/linkedin/burrow/config and save the burrow.cfg as burrow.cfg.orig then edit burrow.cfg to match the environment then copy this file to $GOBIN for simplification. You can also specify the path where the config file is stored.
$ GOPATH/bin/burrow --config path/to/burrow.cfg
For information on how to write your configuration file, check out the https://github.com/linkedin/Burrow/wiki/Configuration
Created on 06-03-2016 01:58 PM
Hello,
Thank you very much! The workaround about gopkg.in/gcfg.v1 was really useful.
Nevertheless, you should verify some commands that are not working for me. Ex:
$ yum install -> needs sudo
$ cd $GOPATH/src/github.com/linkedin/Burrow -> should be burrow
You need to execute this command before "go install"
mv $GOPATH/src/gopkg.in/gcfg/ $GOPATH/src/gopkg.in/gcfg.v1
Otherwise you will get
onfig.go:16:2: cannot find package "gopkg.in/gcfg.v1" in any of:
Also, I think is not required to "add the GOROOT/bin directory to your $PATH" because you are already doing symbolic link and on the other hand, you may use "sudo yum install golang".
Thanks again, this post helped me a lot.
Created on 06-11-2016 11:03 PM
Thank you so much for your review. Your findings were spot-on. I had a few typos and omitted a mv command. Excellent catches.