Created 05-03-2016 05:25 PM
hi:
how can i insert from pig to hbase autoincrement id key'?
STORE d INTO 'hbase://canal_partitioned_v2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('fijo:codtf,fijo:canal,fijo:fechaoprcnf,fijo:frecuencia,fijo:codnrbeenf');
Created 05-04-2016 09:59 AM
Hi:
finally i create a unique id like this:
d = FOREACH c GENERATE UniqueID() as id, (chararray) group.$3 as canal, (chararray) group.$0 as codtf, (chararray) group.$2 as fechaoprcnf, (int)COUNT (b) as frecuencia, (chararray) group.$1 as codnrbeenf; STORE d INTO 'hbase://canal_partitioned_v2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('id,fijo:canal,fijo:codtf,fijo:fechaoprcnf,fijo:frecuencia,fijo:codnrbeenf');
and in the hbase look like this, i think i ROW -ID is not useful
hbase(main):062:0> scan 'canal_partitioned_v2' ROW COLUMN+CELL 0-0 column=fijo:canal, timestamp=1462355983610, value=BDPPM1KK 0-0 column=fijo:codtf, timestamp=1462355983610, value=2016-03-29 0-0 column=fijo:fechaoprcnf, timestamp=1462355983610, value=1 0-0 column=fijo:frecuencia, timestamp=1462355983610, value=3067 0-0 column=fijo:id, timestamp=1462355983610, value=03 0-1 column=fijo:canal, timestamp=1462355983615, value=BDPPM1KK 0-1 column=fijo:codtf, timestamp=1462355983615, value=2016-03-29 0-1 column=fijo:fechaoprcnf, timestamp=1462355983615, value=1 0-1 column=fijo:frecuencia, timestamp=1462355983615, value=3191 0-1 column=fijo:id, timestamp=1462355983615, value=03 0-2 column=fijo:canal, timestamp=1462355983615, value=BDPPM1RG 0-2 column=fijo:codtf, timestamp=1462355983615, value=2016-03-29 0-2 column=fijo:fechaoprcnf, timestamp=1462355983615, value=1 0-2 column=fijo:frecuencia, timestamp=1462355983615, value=3059 0-2 column=fijo:id, timestamp=1462355983615, value=03 0-3 column=fijo:canal, timestamp=1462355983616, value=DVI51OOU 0-3 column=fijo:codtf, timestamp=1462355983616, value=2016-03-29 0-3 column=fijo:fechaoprcnf, timestamp=1462355983616, value=2 0-3 column=fijo:frecuencia, timestamp=1462355983616, value=1554
my IMPORTANT question is, for agretations or frecuencias o wordcloud, i think is not goo to use HBASE, right????
thanks
Created 05-03-2016 11:30 PM
Autoincrementing rowkeys will cause hotspotting, you want to create rowkeys that are not sequential, in fact they should be as random as possible. HBase does not like monotonously increasing rowkeys.
Created 05-04-2016 05:41 AM
Hi:
Thanks for it, so, the row key need to be unique and not sequential...
The problem is, I have a good row key, but is not unique...
Please any default row key that incan apoly????
Thanks
Created 05-04-2016 06:44 AM
You can create composite keys a combination of fields, for example ID|timestamp|another-field then you can use built in pig functions to create that row key. You can also presplit a table and then your non-unique key can possibly be OK, it all depends how often you insert too much data, for one time is fine, you can move regions around after insert. Take a look at key design section of HBase, link is to the old version but concepts are the same https://hbase.apache.org/0.94/book/rowkey.design.html
Created 05-04-2016 09:59 AM
Hi:
finally i create a unique id like this:
d = FOREACH c GENERATE UniqueID() as id, (chararray) group.$3 as canal, (chararray) group.$0 as codtf, (chararray) group.$2 as fechaoprcnf, (int)COUNT (b) as frecuencia, (chararray) group.$1 as codnrbeenf; STORE d INTO 'hbase://canal_partitioned_v2' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('id,fijo:canal,fijo:codtf,fijo:fechaoprcnf,fijo:frecuencia,fijo:codnrbeenf');
and in the hbase look like this, i think i ROW -ID is not useful
hbase(main):062:0> scan 'canal_partitioned_v2' ROW COLUMN+CELL 0-0 column=fijo:canal, timestamp=1462355983610, value=BDPPM1KK 0-0 column=fijo:codtf, timestamp=1462355983610, value=2016-03-29 0-0 column=fijo:fechaoprcnf, timestamp=1462355983610, value=1 0-0 column=fijo:frecuencia, timestamp=1462355983610, value=3067 0-0 column=fijo:id, timestamp=1462355983610, value=03 0-1 column=fijo:canal, timestamp=1462355983615, value=BDPPM1KK 0-1 column=fijo:codtf, timestamp=1462355983615, value=2016-03-29 0-1 column=fijo:fechaoprcnf, timestamp=1462355983615, value=1 0-1 column=fijo:frecuencia, timestamp=1462355983615, value=3191 0-1 column=fijo:id, timestamp=1462355983615, value=03 0-2 column=fijo:canal, timestamp=1462355983615, value=BDPPM1RG 0-2 column=fijo:codtf, timestamp=1462355983615, value=2016-03-29 0-2 column=fijo:fechaoprcnf, timestamp=1462355983615, value=1 0-2 column=fijo:frecuencia, timestamp=1462355983615, value=3059 0-2 column=fijo:id, timestamp=1462355983615, value=03 0-3 column=fijo:canal, timestamp=1462355983616, value=DVI51OOU 0-3 column=fijo:codtf, timestamp=1462355983616, value=2016-03-29 0-3 column=fijo:fechaoprcnf, timestamp=1462355983616, value=2 0-3 column=fijo:frecuencia, timestamp=1462355983616, value=1554
my IMPORTANT question is, for agretations or frecuencias o wordcloud, i think is not goo to use HBASE, right????
thanks
Created 05-04-2016 10:11 AM
for logs its ok to use timestamp+UniqueID()???? for example???
Created 05-05-2016 07:14 AM
hbase is great for random key lookups, I've worked on a project where wordcloud powered by HBase worked just fine. If you have a dashboard, HBase or perhaps Phoenix works pretty well behind it.