Reply
New Contributor
Posts: 4
Registered: ‎06-09-2016

How to update an HBase row key

I know that technically speaking updates in HBase do not happen, but is there a way to change the row key of certain rows without modifying the values for that row? I am trying to find the best way to perform a get, modify the row key, and then put the row back into place with the modified row key. It would also be nice if the timestamp could stay the same as the original . . . Does anyone have any examples of how to perform something like this?

Posts: 1,748
Kudos: 364
Solutions: 277
Registered: ‎07-31-2013

Re: How to update an HBase row key

The Result's Cell APIs fetches you the timestamp of the selected row/column when reading: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/Cell.html#getTimestamp() and the Put API request allows you to specify one when writing: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/client/Put.html#addColu...[])

Row keys are immutable, so what you are looking to do cannot be done in-place. I'd recommend running an MR job to populate a new table sourcing and transforming data from the older one. Pre-split the newer table adequately with the changed row key format for better performance during this job.

After the transformation you can rename the table back into the original name if you'd like to do that.

MR input would be a TableInputFormat from source table.
Your table input scan should likely also filter for those rows you are specifically targeting.
MR output would be a TableOutputFormat for destination table.
Map function would be the row key transformer code that transfers the Result's Cell list contents into a Put with just the row key altered for new format while retaining all other columnar data as-is via the above APIs.

Alternatively, your destination table can be the same as source, but do also a Delete operation at end of the job/transformation for the older row key copy.
New Contributor
Posts: 4
Registered: ‎06-09-2016

Re: How to update an HBase row key

Thanks Harsh!

 

Any idea if it is possible to keep the same timestamp throughout this process?

New Contributor
Posts: 1
Registered: ‎12-11-2017

Re: How to update an HBase row key

I have a relatively easy solution to this problem. I just created a PairRDD of the rows that I wanted to update. then for every row, I just created a Delete and a Put Object. so, it deletes the old record and inserts a new one. the only thing that should be taken care of is the Put object should includes the new row key. than just call saveAsNewAPIHadoopDataset on the new RDD.

New Contributor
Posts: 1
Registered: ‎04-08-2018

Re: How to update an HBase row key

do  you have the code for your response, that would be awesome?

New Contributor
Posts: 4
Registered: ‎06-09-2016

Re: How to update an HBase row key

Hi,

 

In our case, it was a matter of updating the rowkey using data that was in another row. So essentially, we grabbed the data from the "good" row, and saved it to a variable. Next, we did an HBase put using that variable like so:

 

Get get = new Get(Bytes.toBytes(currentRowkey));

Result result = table.get(get);

 

. . .

 

Put dataPut = createPut(Bytes.toBytes(correctedRowkey), hbaseColFamilyFile, hbaseColQualifierFile, result);

Status dataStatus = checkAndPut(currentRowkey, correctedRowkey, hbaseColFamilyFile, hbaseColQualifierFile, table, dataPut, "data");

 

If you'd like to get fancy, you can do a checkAndPut also like so:

 

table.checkAndPut(Bytes.toBytes(correctedRowkey), hbasecolfamily, hbasecolqualifier, null, put

Announcements