Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hbase prespliting question

hbase prespliting question

Master Collaborator

on this forum I saw a following note :

HBase key itself (If it is random enough(not to cause hotspots) than i will suggest pre-splitting without salting to get better scans)

lets say I have a table with varying integer values 14, 1333, 33, 31232 ... etc how can I presplit them in hbase or phoenix ?

2 REPLIES 2

Re: hbase prespliting question

@Sami Ahmad

Using Phoenix:

  • Use Salting to increase read/write performance Salting can significantly increase read/write performance by pre-splitting the data into multiple regions. Although Salting will yield better performance in most scenarios.

Example:

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=16

Note: Ideally for a 16 region server cluster with quad-core CPUs, choose salt buckets between 32-64 for optimal performance.

  • Per-split table Salting does automatic table splitting but in case you want to exactly control where table split occurs with out adding extra byte or change row key order then you can pre-split a table.

Example:

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')

  • Use multiple column families

Column family contains related data in separate files. If you query use selected columns then it make sense to group those columns together in a column family to improve read performance.

Example:

Following create table DDL will create two column faimiles A and B.

CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR)

Article:

https://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

Re: hbase prespliting question

Master Collaborator

Per-split table Salting does automatic table splitting but in case you want to exactly control where table split occurs with out adding extra byte or change row key order then you can pre-split a table.

Example:

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')


but my values are random integers not characters .. how can I pre-split random numbers?
Don't have an account?
Coming from Hortonworks? Activate your account here