Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

multiple row key hbase

Solved Go to solution

multiple row key hbase

Super Collaborator

Hi:

i create this table in hbase with phoenix with 4 column in the row key

CREATE TABLE IF NOT EXISTS journey_oficina_hbase( 
FECHAOPRCNF VARCHAR not null ,
CODNRBEENF VARCHAR not null ,
 CODINTERNO VARCHAR not null,
 CODTXF VARCHAR not null,
 FREQ INTEGER,
IMPORTE DOUBLE
 CONSTRAINT pk PRIMARY KEY (FECHAOPRCNF, CODNRBEENF,CODINTERNO, CODTXF) );

and fot this query is better tu put first FECHAOPRCNF its a range or CODNRBEENF that is string

 "SELECT R.fechaoprcnf as fechaoprcnf,R.codnrbeenf,R.codinterno as codinterno, sum(R.freq) as freq FROM (SELECT fechaoprcnf,codnrbeenf,freq, codinterno,codtxf FROM journey_oficina_hbase 
WHERE codnrbeenf in ('3008') and codinterno in (''),
and (fechaoprcnf >= '", input$fecha[1],
        " 00:00:00' and fechaoprcnf <= '", input$fecha[2]," 23:00:00')) R GROUP by R.fechaoprcnf, R.codnrbeenf,R.codinterno";

thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: multiple row key hbase

The efficiency of FECHAOPRCNF or CODNRBEENF as the leading column in the rowkey might depend on the cardinality of distinct values. If you have many distinct CODNRBEENF values, you can efficiently prune a large portion of your table. Conversely, if you have very few records in your date range in FECHAOPRCNF, it may make sense to leave that as the leading column in the rowkey.

Either way, you can also use SALT_BUCKETS to make sure you get a good degree of parallelism.

3 REPLIES 3

Re: multiple row key hbase

The efficiency of FECHAOPRCNF or CODNRBEENF as the leading column in the rowkey might depend on the cardinality of distinct values. If you have many distinct CODNRBEENF values, you can efficiently prune a large portion of your table. Conversely, if you have very few records in your date range in FECHAOPRCNF, it may make sense to leave that as the leading column in the rowkey.

Either way, you can also use SALT_BUCKETS to make sure you get a good degree of parallelism.

Re: multiple row key hbase

Super Collaborator

I have 60 CODNRBEENF and about the fecha i have data every day, so.. ill check the SALT_BUCKETS.

Do you recomend me put 4 column to the row key??

many many thanks

Highlighted

Re: multiple row key hbase

60 values of CODNRBEENF per day or in total? If you have 60 unique CODNRBEENF per day, leading with that column would be better. Otherwise, the date is probably better over time.

If you are also querying on CODINTERNO and CODTXF (with FECHAOPRCNF and CODNRBEENF), then it makes sense to include them. It is not a problem to have four columns in the primary key constraint.

Don't have an account?
Coming from Hortonworks? Activate your account here