Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to salt row key in Hbase Table

avatar
Rising Star

Hello guys,

I want to salt my row key with a prefix, according this 'formula' :

 new_row_key = (++index % BUCKETS_NUMBER) + original_key

Where can i add this prefix in my row key? It's coding some file, or it's implemented backwards in SQL?

Another question related, if i have a prefix in my row key, if i want to select a specific row key in hbase i won't know what prefix that row key have, this is a limitation or am i seeing thins wrong?

Regards,

Francisco

1 ACCEPTED SOLUTION

avatar
Super Guru

"Where can i add this prefix in my row key? It's coding some file, or it's implemented backwards in SQL?"

You add it to your code which is ingesting data into HBase. That is the most simple way to implement this logic.

"if i have a prefix in my row key, if i want to select a specific row key in hbase i won't know what prefix that row key have, this is a limitation or am i seeing thins wrong?"

Yes, this is a limitation of your "formula". For salt/hashing, you want a stable hash such that, with the actual data in the rowKey, you can compute what the salt/hash is.

I'd recommend that you use Apache Phoenix if you want to implement salting. It provides this as a feature and would likely save you a bit of time/effort.

View solution in original post

6 REPLIES 6

avatar
Super Guru

"Where can i add this prefix in my row key? It's coding some file, or it's implemented backwards in SQL?"

You add it to your code which is ingesting data into HBase. That is the most simple way to implement this logic.

"if i have a prefix in my row key, if i want to select a specific row key in hbase i won't know what prefix that row key have, this is a limitation or am i seeing thins wrong?"

Yes, this is a limitation of your "formula". For salt/hashing, you want a stable hash such that, with the actual data in the rowKey, you can compute what the salt/hash is.

I'd recommend that you use Apache Phoenix if you want to implement salting. It provides this as a feature and would likely save you a bit of time/effort.

avatar
Rising Star

Thanks for your feedback Josh.

In which phase can i implement that?

I'm importing a table from Sql Server to Hbase via Sqoop. Also, thanks for your suggestion about Apache Phoenix.

avatar
Expert Contributor

I would strongly recommend you reconsider your usecase to import a table from SQL Server to HBase. HBase is not a relation database, and most of the practice applied on relational database will not work or degrade HBase performance. Consider Hive or similar technologies for SQL server or relation DB offload.

Example, if you don't know your row key prefix in HBase you will end up doing a full table scan, which is an expensive operation in HBase. So designing rowkey is the most important step in HBase unlike relational DBs. More information here and why:

https://hbase.apache.org/book.html#rowkey.design

avatar
Rising Star

Hi @Umair Khan,

Explaining all the cenario, i'm doing a small project (for master thesis) in which i'm trying to use some of the capabilities of Hortonworks and the world of Hadoop.

I have different types of data, and one of the types are stored in SQL Server, it's 4 or 5 simple tables, that i can model in 2 in Hbase. My ideia with this dataset is store in a non relational database like Hbase and query with Hive, exploring the capabilities of the Hbase and Hive together.

I know that Hbase isn't the ideal tool to retrieve a specific record, moreover with a specific record by rowkey prefixed. But i'm trying to implement one of this techniques to optimize and prevent hotspotting regions.

So, where can i do the implementation of a reversed rowkey or salted rowkey? I can do this in Sql, but i think its very inneficient...

avatar
Super Guru

At risk of repeating myself, you should use Apache Phoenix. It will give you the perks of HBase as a backing store with a SQL front-end for you to use as an entry-point.

avatar
Rising Star

@Josh Elser If i use Apache Phoenix to create a View over my HBase table, i don't need to care about prefixing my rowkey right? Just need to specify the SALT_BUCKETS and Pre-Splitting (if not, it's automatic).