05-19-2017 10:50 AM
I've read the general advice to avoid more than 2-3 column families on an HBase table because of the way flushing works.
I've also come across a HBase ticket that does flushing per family so if a family doesn't have enough data it doesn't cause extra I/O.
Looks like that patch got applied in HBase v1.1 and we're using HBase v1.2. With this flusing change, are there still problems with more than 2-3 families?
05-25-2017 08:01 AM
I also had somebody here contact Cloudera directly. Here's the response we got.
I see you have found a HBase jira case related to compaction and other I/O related to certain size column families that may affect the recommendation on the number of column families per table.
As it currently stands, Cloudera still recommends a max of two or three column families. This aligns with the Apache HBase documentation for the current shipping HBase v1.2 that ships with CDH 5.11.x.
Even the documentation for the next version of HBase from Apache still references a max of two to three column families. While this recommendation may change in the future, Cloudera currently aligns with the HBase community recommendation regarding Column Families.