- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to update an HBase row key
- Labels:
-
Apache HBase
Created on ‎09-09-2016 08:59 AM - edited ‎09-16-2022 03:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know that technically speaking updates in HBase do not happen, but is there a way to change the row key of certain rows without modifying the values for that row? I am trying to find the best way to perform a get, modify the row key, and then put the row back into place with the modified row key. It would also be nice if the timestamp could stay the same as the original . . . Does anyone have any examples of how to perform something like this?
Created ‎09-11-2016 04:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Row keys are immutable, so what you are looking to do cannot be done in-place. I'd recommend running an MR job to populate a new table sourcing and transforming data from the older one. Pre-split the newer table adequately with the changed row key format for better performance during this job.
After the transformation you can rename the table back into the original name if you'd like to do that.
MR input would be a TableInputFormat from source table.
Your table input scan should likely also filter for those rows you are specifically targeting.
MR output would be a TableOutputFormat for destination table.
Map function would be the row key transformer code that transfers the Result's Cell list contents into a Put with just the row key altered for new format while retaining all other columnar data as-is via the above APIs.
Alternatively, your destination table can be the same as source, but do also a Delete operation at end of the job/transformation for the older row key copy.
Created ‎09-15-2016 06:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Harsh!
Any idea if it is possible to keep the same timestamp throughout this process?
Created ‎12-11-2017 11:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a relatively easy solution to this problem. I just created a PairRDD of the rows that I wanted to update. then for every row, I just created a Delete and a Put Object. so, it deletes the old record and inserts a new one. the only thing that should be taken care of is the Put object should includes the new row key. than just call saveAsNewAPIHadoopDataset on the new RDD.
Created ‎06-12-2018 08:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do you have the code for your response, that would be awesome?
Created ‎06-12-2018 08:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
In our case, it was a matter of updating the rowkey using data that was in another row. So essentially, we grabbed the data from the "good" row, and saved it to a variable. Next, we did an HBase put using that variable like so:
Get get = new Get(Bytes.toBytes(currentRowkey));
Result result = table.get(get);
. . .
Put dataPut = createPut(Bytes.toBytes(correctedRowkey), hbaseColFamilyFile, hbaseColQualifierFile, result);
Status dataStatus = checkAndPut(currentRowkey, correctedRowkey, hbaseColFamilyFile, hbaseColQualifierFile, table, dataPut, "data");
If you'd like to get fancy, you can do a checkAndPut also like so:
table.checkAndPut(Bytes.toBytes(correctedRowkey), hbasecolfamily, hbasecolqualifier, null, put
Created on ‎05-27-2019 06:35 AM - edited ‎05-27-2019 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Small note that's relevant to this (older) topic:
When copying over Cells from one fetched Scan/Get Result to another Put object with the altered key, do not add the Cell objects as-is via Put::addCell(…) API. You'll need to instead copy the value portions exclusively.
A demo program for a single key operation would look like this:
public static void main(String[] args) throws Exception { Configuration conf = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(conf); Table sourceTable = connection.getTable(TableName.valueOf("old_table")); Table destinationTable = connection.getTable(TableName.valueOf("new_table")); Result result = sourceTable.get(new Get("old-key".getBytes())); Put put = new Put("new-key".getBytes()); for (Cell cell: result.rawCells()) { put.addColumn(cell.getFamilyArray(), cell.getQualifierArray(), cell.getTimestamp(), cell.getValueArray()); } destinationTable.put(put); }
The reason to avoid Put::addCell(…) is that the Cell objects from Result will still carry the older key and you'll receive a WrongRowIOException if you attempt to use it with a Put object initiated with a changed key.
Created ‎05-27-2019 08:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In a mapper job, we can do the following as well.
Basically it follows the same pattern of CF renaming.
public class RowKeyRenameImporter extends TableMapper<ImmutableBytesWritable, Mutation> { private static final Log LOG = LogFactory.getLog(RowKeyRenameImporter.class); public final static String WAL_DURABILITY = "import.wal.durability"; public final static String ROWKEY_RENAME_IMPL = "row.key.rename"; private List<UUID> clusterIds; private Durability durability; private RowKeyRename rowkeyRenameImpl; /** * @param row The current table row key. * @param value The columns. * @param context The current context. * @throws IOException When something is broken with the data. */ @Override public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException { try { writeResult(row, value, context); } catch (InterruptedException e) { e.printStackTrace(); } } private void writeResult(ImmutableBytesWritable key, Result result, Context context) throws IOException, InterruptedException { Put put = null; if (LOG.isTraceEnabled()) { LOG.trace("Considering the row." + Bytes.toString(key.get(), key.getOffset(), key.getLength())); } processKV(key, result, context, put); } protected void processKV(ImmutableBytesWritable key, Result result, Context context, Put put) throws IOException, InterruptedException { LOG.info("Renaming the row " + key.toString()); ImmutableBytesWritable renameRowKey = rowkeyRenameImpl.rowKeyRename(key); for (Cell kv : result.rawCells()) { if (put == null) { put = new Put(renameRowKey.get()); } Cell renamedKV = convertKv(kv, renameRowKey); addPutToKv(put, renamedKV); if (put != null) { if (durability != null) { put.setDurability(durability); } put.setClusterIds(clusterIds); context.write(key, put); } } } // helper: create a new KeyValue based on renaming of row Key private static Cell convertKv(Cell kv, ImmutableBytesWritable renameRowKey) { byte[] newCfName = CellUtil.cloneFamily(kv); kv = new KeyValue(renameRowKey.get(), // row buffer renameRowKey.getOffset(), // row offset renameRowKey.getLength(), // row length newCfName, // CF buffer 0, // CF offset kv.getFamilyLength(), // CF length kv.getQualifierArray(), // qualifier buffer kv.getQualifierOffset(), // qualifier offset kv.getQualifierLength(), // qualifier length kv.getTimestamp(), // timestamp KeyValue.Type.codeToType(kv.getTypeByte()), // KV Type kv.getValueArray(), // value buffer kv.getValueOffset(), // value offset kv.getValueLength()); // value length return kv; } protected void addPutToKv(Put put, Cell kv) throws IOException { put.add(kv); }
