Created 06-29-2016 09:17 PM
Hi:
i create this table in hbase with phoenix with 4 column in the row key
CREATE TABLE IF NOT EXISTS journey_oficina_hbase( FECHAOPRCNF VARCHAR not null , CODNRBEENF VARCHAR not null , CODINTERNO VARCHAR not null, CODTXF VARCHAR not null, FREQ INTEGER, IMPORTE DOUBLE CONSTRAINT pk PRIMARY KEY (FECHAOPRCNF, CODNRBEENF,CODINTERNO, CODTXF) );
and fot this query is better tu put first FECHAOPRCNF its a range or CODNRBEENF that is string
"SELECT R.fechaoprcnf as fechaoprcnf,R.codnrbeenf,R.codinterno as codinterno, sum(R.freq) as freq FROM (SELECT fechaoprcnf,codnrbeenf,freq, codinterno,codtxf FROM journey_oficina_hbase WHERE codnrbeenf in ('3008') and codinterno in (''), and (fechaoprcnf >= '", input$fecha[1], " 00:00:00' and fechaoprcnf <= '", input$fecha[2]," 23:00:00')) R GROUP by R.fechaoprcnf, R.codnrbeenf,R.codinterno";
thanks
Created 06-29-2016 09:24 PM
The efficiency of FECHAOPRCNF or CODNRBEENF as the leading column in the rowkey might depend on the cardinality of distinct values. If you have many distinct CODNRBEENF values, you can efficiently prune a large portion of your table. Conversely, if you have very few records in your date range in FECHAOPRCNF, it may make sense to leave that as the leading column in the rowkey.
Either way, you can also use SALT_BUCKETS to make sure you get a good degree of parallelism.
Created 06-29-2016 09:24 PM
The efficiency of FECHAOPRCNF or CODNRBEENF as the leading column in the rowkey might depend on the cardinality of distinct values. If you have many distinct CODNRBEENF values, you can efficiently prune a large portion of your table. Conversely, if you have very few records in your date range in FECHAOPRCNF, it may make sense to leave that as the leading column in the rowkey.
Either way, you can also use SALT_BUCKETS to make sure you get a good degree of parallelism.
Created 06-29-2016 09:34 PM
I have 60 CODNRBEENF and about the fecha i have data every day, so.. ill check the SALT_BUCKETS.
Do you recomend me put 4 column to the row key??
many many thanks
Created 07-01-2016 03:47 PM
60 values of CODNRBEENF per day or in total? If you have 60 unique CODNRBEENF per day, leading with that column would be better. Otherwise, the date is probably better over time.
If you are also querying on CODINTERNO and CODTXF (with FECHAOPRCNF and CODNRBEENF), then it makes sense to include them. It is not a problem to have four columns in the primary key constraint.