Support Questions
Find answers, ask questions, and share your expertise

How many salt buckets should I use for my Phoenix tables?

How do upserts of new records impact the number of pre-split regions?

How do updates of existing records impact the number of pre-split regions?

4 REPLIES 4

Guru

Since the number of salt buckets can only be set a table creation time this can be a little tricky. It takes a small amount of foresight in understanding your needs from the table AKA will the table be more read heavy or write heavy. A neutral stance would be to set the number of salt buckets to the number of Hbase RegionServers in your cluster. If you anticipate heavy write loads increasing that to something around {Hbase RegionServer Count * 1.20} which would increase the number of buckets by 20% and allow for a more distributed load. Increasing the salt buckets too high however may reduce your flexibility when you perform range based queries.

@Jeremy Dyer -- updated the question with additional items - any comments on those?

I was recently in a discussion with @Rajeshbabu Chintaguntla about this. If your table size is relatively small compare to the amount of block cache you have available (e.g. if you can cache your entire table), it makes sense to limit the number of salt-buckets to the number of region servers you have (similar to Jeremy's recommendation). However, once you start getting much larger tables, ones that will definitely not fit into cache and would require disk access, you're going to benefit by having more buckets available to distribute the load across multiple regions per server. I believe the consensus we had there was that something along the lines of 64 to 128 salt buckets would be a good starting point for 10's of region servers.

Obviously, this depends a lot on the number of region servers you're using too and the other users of HBase. If you're the only one using a 50node HBase cluster, the recommendations would be vastly different than one of 10 users using a 25node HBase cluster. "It depends" 🙂

Rising Star

Consider following points to decide salting buckets:

  1. No of region servers available
  2. Expected write throughput
  3. HBase key itself (If it is random enough(not to cause hotspots) than i will suggest pre-splitting without salting to get better scans)
  4. Increasing salt buckets to high number may result in slower scans(depending on table size and scan)