Support Questions

Find answers, ask questions, and share your expertise

Partition Hive Table to Hbase Handler ?

avatar

Hi,

I use CDH

I have a partition table in Hive . Its a date partition. Its a month data. So there will be 30 to 31 partitions in hive.

Now the same table in hive can i move to Hbase?

I know to create a external table in hive which points to Hbase. and I know to create a partition table in Hive. Now how to integrate both. Partition I must use static partition for this use case? . any other suggestions?

I have a 100 millions records for a month data and i want to move to hbase and write a impala query for retrieval for good performance.

What i do is i create a staging table in hive and move to hbase .

Now i need to have partitions in hive. In this case how can i proceed.

For my use case i want to move the data and select it for display. I m not going to do any sort of processing.

Since i have millions of records in a month i wan to go for daily partitions and move records to date partitions so when i write a select query my response time would be fast.

Thanks

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi,

 

The concept of Hive partition do not map to HBase tables.

So if you want to have HBase as the storage then you will need to workaround your use case.

 

You could try to use "one HBase table" having a row key constructed with the partition value. That way you should be able to query your HBase table using the row key and avoid a full scan of the table.

 

Or you could have one HBase table per "partition" (this also mean one hive table per partition).

 

Or you could see that HBase do not answer your need and stay in Hive ?

 

regards,

Mathieu

 

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hi,

 

The concept of Hive partition do not map to HBase tables.

So if you want to have HBase as the storage then you will need to workaround your use case.

 

You could try to use "one HBase table" having a row key constructed with the partition value. That way you should be able to query your HBase table using the row key and avoid a full scan of the table.

 

Or you could have one HBase table per "partition" (this also mean one hive table per partition).

 

Or you could see that HBase do not answer your need and stay in Hive ?

 

regards,

Mathieu