- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive and Hbase table
- Labels:
-
Apache HBase
-
Apache Hive
Created ‎12-12-2016 12:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I implement such scenario:
1.One data copy
2.UPDATE/DELETE/INSERT in Hbase
3.Query Table in Hive.
4.How about the performance of query in hive compare to ORC?
5.Or just turn on ACID in HIVE to implement above?
Thanks
Created ‎12-13-2016 04:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎12-12-2016 12:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
You can definitely upload data in hdfs and then in Hbase through Hive. You can also query Hbase through Hive using the hbase storagehandler.
Please refer here for more detailed explanation: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
If this is derived from a Hive table it has a schema so I would also consider the Hive / Phoenix storage handler:https://phoenix.apache.org/hive_storage_handler.html
On a performance standpoint overall querying Hbase through Hive should be less performant then querying ORC tables. This beeing said it depends on the query pattern and what the use case is.
regards
Created ‎12-13-2016 03:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @nmaillard And how about the ACID performance?
Created ‎12-13-2016 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
our HDP 2.5's phoenix version is V4.7
Created ‎12-13-2016 04:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎12-13-2016 06:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi, Thanks for your response.
We using sqoop data from oracle tables to HDFS( HIVE external table), and then insert into ORC table in HIVE to support data analytics. And our HIVE currently not turn ACID on. Most of tables size currently less than 1TBs. Now there is requirement to update the imported table data in HIVE, because of the source data updated. I seached on web and found it seems ACID are not very good on performance when update and the ACID tables are also not recognized outside of HIVE(e.g. SPARK). We are looking for a most performance approach for it. So I considered to implemented it by using hbase storagehandler or sqoop merge ?
Created ‎12-13-2016 03:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HBaseStoragehandler is what is required to read HBase tables. At the end of the day, you first have to create and manage HBase and then use Hive. Since, you are going to be doing updates, this might be the best way to go about it but I would strongly recommend to look at the following approach. The reason is probably my personal preference of not using HBase until required as it is complex and skill set required to successfully implement is difficult to find. That being said, in your use case, if you don't like the following approach, I'd prefer HBase over Hive ACID.
http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
