Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How does HBase write performance differ from write performance in Cassandra with consistency level ALL?

avatar
New Contributor
 
2 REPLIES 2

avatar

Choosing a W=N quorum in Cassandra is suicidal. It means that any single node going down will block write updates to all the contained shards. Which is why when talking about Cassandra strong consistency options - with N=3, typically experts will recommend W=2, R=2.

So imho - W=3, R=1 is not feasible in Cassandra and there's no practical equivalent to the way Hbase / Hdfs achieve this. Note that the reason this is not a problem with Hdfs is that the write quorum is reconfigured on datanode failures. (But to be fair this comes with the availability/performance tradeoffs of a centralized namenode).

Dynamo attempted to deal with this relaxing the definition of the quorum group (from a fixed N nodes to N contiguous available nodes) (this part is controversial and different people read the paper differently). But that throws all notions of strong consistency to the wind and when Cassandra folks talk about strong consistency and quorums - they do not (seem to) use Dynamo's definitions.

For more information. Please Click here:

http://mindmajix.com/hbase-training

avatar

While setting the a write consistency level of ALL with a read level of ONE in Cassandra provides a strong consistency model similar to what HBase provides (and in fact using quorum writes and reads would as well), the two operations are actually semantically different and lead to different durability and availability guarantees. In Cassandra, writing with a consistency level of ALL means that the data will be written to all N nodes responsible for the particular piece of data, where N is the replication factor, before the client gets a response. In a standard Cassandra configuration, the write goes into an in-memory table and an in-memory log for each node. The log is periodically batch flushed to disk; there is also an option to flush per commit, but this option severely impacts performance. Subsequent reads from any node are strongly consistent and get the most recent update. In contrast, HBase has only one region server responsible for serving a given piece of data at any one time, and replication is handled on the HDFS layer. A client sends an update to the region server currently responsible for the update key, and the region server responds with an ack as soon as it updates its in-memory data structure and flushes the update to its write-ahead commit log. In older versions of HBase, the log was configured in a similar manner to Cassandra to flush periodically. As a few commenters have pointed out, the default configuration of more recent versions of HBase flush the commit log before acknowledging writes to the client, using group commit to batch flushes across writes for performance. Replication to the N HDFS nodes responsible for the written data still happens asynchronously, however. HBase ensures strong consistency by routing subsequent reads through the same region server and, if a region server goes down, by using a system of locks based on ZooKeeper so that reads take into account the latest update. Because Cassandra writes data synchronously to all N nodes in this scheme whereas HBase writes data synchronously to only one node, Cassandra is necessarily slower. In this scheme, write latency in Cassandra is essentially bottlenecked by the slowest machine and subject to variance in network speeds, IO speeds, and CPU loads across machines.

The tradeoff comes in availability. Because only the write-ahead log has been replicated to the other HDFS nodes, if the region server that accepted the write fails, the ranges of data it was serving will be temporarily unavailable until a new server is assigned and the log is replayed. On the other hand, Cassandra will still have and serve the data (given the read level of ONE) even if N-1 nodes responsible for the data go down.