Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

multiple row key hbase

avatar
Master Collaborator

Hi:

i create this table in hbase with phoenix with 4 column in the row key

CREATE TABLE IF NOT EXISTS journey_oficina_hbase( 
FECHAOPRCNF VARCHAR not null ,
CODNRBEENF VARCHAR not null ,
 CODINTERNO VARCHAR not null,
 CODTXF VARCHAR not null,
 FREQ INTEGER,
IMPORTE DOUBLE
 CONSTRAINT pk PRIMARY KEY (FECHAOPRCNF, CODNRBEENF,CODINTERNO, CODTXF) );

and fot this query is better tu put first FECHAOPRCNF its a range or CODNRBEENF that is string

 "SELECT R.fechaoprcnf as fechaoprcnf,R.codnrbeenf,R.codinterno as codinterno, sum(R.freq) as freq FROM (SELECT fechaoprcnf,codnrbeenf,freq, codinterno,codtxf FROM journey_oficina_hbase 
WHERE codnrbeenf in ('3008') and codinterno in (''),
and (fechaoprcnf >= '", input$fecha[1],
        " 00:00:00' and fechaoprcnf <= '", input$fecha[2]," 23:00:00')) R GROUP by R.fechaoprcnf, R.codnrbeenf,R.codinterno";

thanks

1 ACCEPTED SOLUTION

avatar
Super Guru

The efficiency of FECHAOPRCNF or CODNRBEENF as the leading column in the rowkey might depend on the cardinality of distinct values. If you have many distinct CODNRBEENF values, you can efficiently prune a large portion of your table. Conversely, if you have very few records in your date range in FECHAOPRCNF, it may make sense to leave that as the leading column in the rowkey.

Either way, you can also use SALT_BUCKETS to make sure you get a good degree of parallelism.

View solution in original post

3 REPLIES 3

avatar
Super Guru

The efficiency of FECHAOPRCNF or CODNRBEENF as the leading column in the rowkey might depend on the cardinality of distinct values. If you have many distinct CODNRBEENF values, you can efficiently prune a large portion of your table. Conversely, if you have very few records in your date range in FECHAOPRCNF, it may make sense to leave that as the leading column in the rowkey.

Either way, you can also use SALT_BUCKETS to make sure you get a good degree of parallelism.

avatar
Master Collaborator

I have 60 CODNRBEENF and about the fecha i have data every day, so.. ill check the SALT_BUCKETS.

Do you recomend me put 4 column to the row key??

many many thanks

avatar
Super Guru

60 values of CODNRBEENF per day or in total? If you have 60 unique CODNRBEENF per day, leading with that column would be better. Otherwise, the date is probably better over time.

If you are also querying on CODINTERNO and CODTXF (with FECHAOPRCNF and CODNRBEENF), then it makes sense to include them. It is not a problem to have four columns in the primary key constraint.