Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hbase integration with hive

Hbase integration with hive

New Contributor

I have a Hbase table which got saved into two region server(56-regions) where I have 8 region servers..hope due to this when I am reading this table thru Hive getting all mappers(56) are stuck in processing..what could be the solution to spped up..What I am suspecting to distribute these 56-regions into other regions servers..any one know how can this done..

7 REPLIES 7

Re: Hbase integration with hive

@ammu chDoes that single table regions unbalances or all the tables regions cross the cluster unbalanced? I would suggest you to run the balancer until the regions get balanced. Once the regions of that table balanced then I'm hoping you will see improvement in your job.

hbase> balance_switch true

Re: Hbase integration with hive

New Contributor

Jitendra...when you say balance_switch true ...what it is doing ...the region servers data was already balanced. Here my issue us the 56-regions for the given table are loaded only into two region server instead of eight servers

Re: Hbase integration with hive

So if this is the case then you have to manually move the regions for that table. Try this on few regions and see if they are moving in distributed mode. Here "SERVER_NAME" is option if you don't provide then it will pick random region server.

hbase> move ‘ENCODED_REGIONNAME’, ‘SERVER_NAME’

Highlighted

Re: Hbase integration with hive

@ammu ch as jitendra mentionned you can balance you hbase tables. Defining how and why your compute is stuck is the first step. Skew can be one, maybe you are not using the rowkeys in a efficient manner can be another.

Alternatively To make things faster you can also use Hive to read snapshots of hbase table, this can significantly faster as the data is read of Hdfs and not through Hbase online API. This presentation will have further info if you want: http://fr.slideshare.net/HBaseCon/ecosystem-session-3a

hope this helps

Re: Hbase integration with hive

Super Collaborator

ammu:

Which release of HDP are you using ?

When the regions in the cluster are balanced, it is not guaranteed that regions per table would be balanced.

Here is related cost key from StochasticLoadBalancer:

private static final String TABLE_SKEW_COST_KEY = "hbase.master.balancer.stochastic.tableSkewCost"; private static final float DEFAULT_TABLE_SKEW_COST = 35;

You can increase the value for the key so that regions per table are better balanced.

What was the load like on region servers when the hive job was running ?

Have you disabled swapping ?

If you can provide some more details, that would help us determine the cause.

Re: Hbase integration with hive

New Contributor

we are using HDP2.2 and Hbase 0.98.and Hive 0.14

yes all the region servers are already balanced ..but for this table all 56 regions are stored only in two servers..for other tables I see regions are distributed between all region servers..

Re: Hbase integration with hive

Super Collaborator

Were (some of the region servers) undergoing GC pause during the job run ?

Please share JVM parameters if you need tuning advice.