Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to update an HBase row key

avatar
Explorer

I know that technically speaking updates in HBase do not happen, but is there a way to change the row key of certain rows without modifying the values for that row? I am trying to find the best way to perform a get, modify the row key, and then put the row back into place with the modified row key. It would also be nice if the timestamp could stay the same as the original . . . Does anyone have any examples of how to perform something like this?

7 REPLIES 7

avatar
Mentor
The Result's Cell APIs fetches you the timestamp of the selected row/column when reading: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/Cell.html#getTimestamp() and the Put API request allows you to specify one when writing: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/client/Put.html#addColu...[])

Row keys are immutable, so what you are looking to do cannot be done in-place. I'd recommend running an MR job to populate a new table sourcing and transforming data from the older one. Pre-split the newer table adequately with the changed row key format for better performance during this job.

After the transformation you can rename the table back into the original name if you'd like to do that.

MR input would be a TableInputFormat from source table.
Your table input scan should likely also filter for those rows you are specifically targeting.
MR output would be a TableOutputFormat for destination table.
Map function would be the row key transformer code that transfers the Result's Cell list contents into a Put with just the row key altered for new format while retaining all other columnar data as-is via the above APIs.

Alternatively, your destination table can be the same as source, but do also a Delete operation at end of the job/transformation for the older row key copy.

avatar
Explorer

Thanks Harsh!

 

Any idea if it is possible to keep the same timestamp throughout this process?

avatar
New Contributor

I have a relatively easy solution to this problem. I just created a PairRDD of the rows that I wanted to update. then for every row, I just created a Delete and a Put Object. so, it deletes the old record and inserts a new one. the only thing that should be taken care of is the Put object should includes the new row key. than just call saveAsNewAPIHadoopDataset on the new RDD.

avatar
New Contributor

do  you have the code for your response, that would be awesome?

avatar
Explorer

Hi,

 

In our case, it was a matter of updating the rowkey using data that was in another row. So essentially, we grabbed the data from the "good" row, and saved it to a variable. Next, we did an HBase put using that variable like so:

 

Get get = new Get(Bytes.toBytes(currentRowkey));

Result result = table.get(get);

 

. . .

 

Put dataPut = createPut(Bytes.toBytes(correctedRowkey), hbaseColFamilyFile, hbaseColQualifierFile, result);

Status dataStatus = checkAndPut(currentRowkey, correctedRowkey, hbaseColFamilyFile, hbaseColQualifierFile, table, dataPut, "data");

 

If you'd like to get fancy, you can do a checkAndPut also like so:

 

table.checkAndPut(Bytes.toBytes(correctedRowkey), hbasecolfamily, hbasecolqualifier, null, put

avatar
Mentor

Small note that's relevant to this (older) topic:

 

When copying over Cells from one fetched Scan/Get Result to another Put object with the altered key, do not add the Cell objects as-is via Put::addCell(…) API. You'll need to instead copy the value portions exclusively.

 

A demo program for a single key operation would look like this:

 

 

public static void main(String[] args) throws Exception {
    Configuration conf = HBaseConfiguration.create();
    Connection connection = ConnectionFactory.createConnection(conf);
    Table sourceTable = connection.getTable(TableName.valueOf("old_table"));
    Table destinationTable = connection.getTable(TableName.valueOf("new_table"));
    Result result = sourceTable.get(new Get("old-key".getBytes()));
    Put put = new Put("new-key".getBytes());
    for (Cell cell: result.rawCells()) {
      put.addColumn(cell.getFamilyArray(), cell.getQualifierArray(), cell.getTimestamp(), cell.getValueArray());
    }
    destinationTable.put(put);
  }

The reason to avoid Put::addCell(…) is that the Cell objects from Result will still carry the older key and you'll receive a WrongRowIOException if you attempt to use it with a Put object initiated with a changed key.

 

avatar
New Contributor

In a mapper job, we can do the following as well.

Basically it follows the same pattern of CF renaming.

 

 

public class RowKeyRenameImporter extends TableMapper<ImmutableBytesWritable, Mutation> {
	private static final Log LOG = LogFactory.getLog(RowKeyRenameImporter.class);
	public final static String WAL_DURABILITY = "import.wal.durability";
	public final static String ROWKEY_RENAME_IMPL = "row.key.rename";
	private List<UUID> clusterIds;
	private Durability durability;
	private RowKeyRename rowkeyRenameImpl;

	/**
	 * @param row     The current table row key.
	 * @param value   The columns.
	 * @param context The current context.
	 * @throws IOException When something is broken with the data.
	 */
	@Override
	public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException {
		try {
			writeResult(row, value, context);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}

	private void writeResult(ImmutableBytesWritable key, Result result, Context context)
			throws IOException, InterruptedException {
		Put put = null;
		if (LOG.isTraceEnabled()) {
			LOG.trace("Considering the row." + Bytes.toString(key.get(), key.getOffset(), key.getLength()));
		}
		processKV(key, result, context, put);
	}

	protected void processKV(ImmutableBytesWritable key, Result result, Context context, Put put)
			throws IOException, InterruptedException {
		LOG.info("Renaming the row " + key.toString());
		ImmutableBytesWritable renameRowKey = rowkeyRenameImpl.rowKeyRename(key);
		for (Cell kv : result.rawCells()) {
			if (put == null) {
				put = new Put(renameRowKey.get());
			}

			Cell renamedKV = convertKv(kv, renameRowKey);
			addPutToKv(put, renamedKV);

			if (put != null) {
				if (durability != null) {
					put.setDurability(durability);
				}
				put.setClusterIds(clusterIds);
				context.write(key, put);
			}
		}
	}

	// helper: create a new KeyValue based on renaming of row Key
	private static Cell convertKv(Cell kv, ImmutableBytesWritable renameRowKey) {
		byte[] newCfName = CellUtil.cloneFamily(kv);

		kv = new KeyValue(renameRowKey.get(), // row buffer
				renameRowKey.getOffset(), // row offset
				renameRowKey.getLength(), // row length
				newCfName, // CF buffer
				0, // CF offset
				kv.getFamilyLength(), // CF length
				kv.getQualifierArray(), // qualifier buffer
				kv.getQualifierOffset(), // qualifier offset
				kv.getQualifierLength(), // qualifier length
				kv.getTimestamp(), // timestamp
				KeyValue.Type.codeToType(kv.getTypeByte()), // KV Type
				kv.getValueArray(), // value buffer
				kv.getValueOffset(), // value offset
				kv.getValueLength()); // value length
		return kv;
	}

	protected void addPutToKv(Put put, Cell kv) throws IOException {
		put.add(kv);
	}