Created on 09-09-2016 08:59 AM - edited 09-16-2022 03:38 AM
I know that technically speaking updates in HBase do not happen, but is there a way to change the row key of certain rows without modifying the values for that row? I am trying to find the best way to perform a get, modify the row key, and then put the row back into place with the modified row key. It would also be nice if the timestamp could stay the same as the original . . . Does anyone have any examples of how to perform something like this?
Created 09-11-2016 04:06 AM
Created 09-15-2016 06:59 AM
Thanks Harsh!
Any idea if it is possible to keep the same timestamp throughout this process?
Created 12-11-2017 11:00 AM
I have a relatively easy solution to this problem. I just created a PairRDD of the rows that I wanted to update. then for every row, I just created a Delete and a Put Object. so, it deletes the old record and inserts a new one. the only thing that should be taken care of is the Put object should includes the new row key. than just call saveAsNewAPIHadoopDataset on the new RDD.
Created 06-12-2018 08:23 AM
do you have the code for your response, that would be awesome?
Created 06-12-2018 08:37 AM
Hi,
In our case, it was a matter of updating the rowkey using data that was in another row. So essentially, we grabbed the data from the "good" row, and saved it to a variable. Next, we did an HBase put using that variable like so:
Get get = new Get(Bytes.toBytes(currentRowkey));
Result result = table.get(get);
. . .
Put dataPut = createPut(Bytes.toBytes(correctedRowkey), hbaseColFamilyFile, hbaseColQualifierFile, result);
Status dataStatus = checkAndPut(currentRowkey, correctedRowkey, hbaseColFamilyFile, hbaseColQualifierFile, table, dataPut, "data");
If you'd like to get fancy, you can do a checkAndPut also like so:
table.checkAndPut(Bytes.toBytes(correctedRowkey), hbasecolfamily, hbasecolqualifier, null, put
Created on 05-27-2019 06:35 AM - edited 05-27-2019 06:37 AM
Small note that's relevant to this (older) topic:
When copying over Cells from one fetched Scan/Get Result to another Put object with the altered key, do not add the Cell objects as-is via Put::addCell(…) API. You'll need to instead copy the value portions exclusively.
A demo program for a single key operation would look like this:
public static void main(String[] args) throws Exception { Configuration conf = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(conf); Table sourceTable = connection.getTable(TableName.valueOf("old_table")); Table destinationTable = connection.getTable(TableName.valueOf("new_table")); Result result = sourceTable.get(new Get("old-key".getBytes())); Put put = new Put("new-key".getBytes()); for (Cell cell: result.rawCells()) { put.addColumn(cell.getFamilyArray(), cell.getQualifierArray(), cell.getTimestamp(), cell.getValueArray()); } destinationTable.put(put); }
The reason to avoid Put::addCell(…) is that the Cell objects from Result will still carry the older key and you'll receive a WrongRowIOException if you attempt to use it with a Put object initiated with a changed key.
Created 05-27-2019 08:47 AM
In a mapper job, we can do the following as well.
Basically it follows the same pattern of CF renaming.
public class RowKeyRenameImporter extends TableMapper<ImmutableBytesWritable, Mutation> { private static final Log LOG = LogFactory.getLog(RowKeyRenameImporter.class); public final static String WAL_DURABILITY = "import.wal.durability"; public final static String ROWKEY_RENAME_IMPL = "row.key.rename"; private List<UUID> clusterIds; private Durability durability; private RowKeyRename rowkeyRenameImpl; /** * @param row The current table row key. * @param value The columns. * @param context The current context. * @throws IOException When something is broken with the data. */ @Override public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException { try { writeResult(row, value, context); } catch (InterruptedException e) { e.printStackTrace(); } } private void writeResult(ImmutableBytesWritable key, Result result, Context context) throws IOException, InterruptedException { Put put = null; if (LOG.isTraceEnabled()) { LOG.trace("Considering the row." + Bytes.toString(key.get(), key.getOffset(), key.getLength())); } processKV(key, result, context, put); } protected void processKV(ImmutableBytesWritable key, Result result, Context context, Put put) throws IOException, InterruptedException { LOG.info("Renaming the row " + key.toString()); ImmutableBytesWritable renameRowKey = rowkeyRenameImpl.rowKeyRename(key); for (Cell kv : result.rawCells()) { if (put == null) { put = new Put(renameRowKey.get()); } Cell renamedKV = convertKv(kv, renameRowKey); addPutToKv(put, renamedKV); if (put != null) { if (durability != null) { put.setDurability(durability); } put.setClusterIds(clusterIds); context.write(key, put); } } } // helper: create a new KeyValue based on renaming of row Key private static Cell convertKv(Cell kv, ImmutableBytesWritable renameRowKey) { byte[] newCfName = CellUtil.cloneFamily(kv); kv = new KeyValue(renameRowKey.get(), // row buffer renameRowKey.getOffset(), // row offset renameRowKey.getLength(), // row length newCfName, // CF buffer 0, // CF offset kv.getFamilyLength(), // CF length kv.getQualifierArray(), // qualifier buffer kv.getQualifierOffset(), // qualifier offset kv.getQualifierLength(), // qualifier length kv.getTimestamp(), // timestamp KeyValue.Type.codeToType(kv.getTypeByte()), // KV Type kv.getValueArray(), // value buffer kv.getValueOffset(), // value offset kv.getValueLength()); // value length return kv; } protected void addPutToKv(Put put, Cell kv) throws IOException { put.add(kv); }