Created 12-12-2016 12:40 PM
Can I implement such scenario:
1.One data copy
2.UPDATE/DELETE/INSERT in Hbase
3.Query Table in Hive.
4.How about the performance of query in hive compare to ORC?
5.Or just turn on ACID in HIVE to implement above?
Thanks
Created 12-13-2016 04:01 AM
Created 12-12-2016 12:48 PM
Hello
You can definitely upload data in hdfs and then in Hbase through Hive. You can also query Hbase through Hive using the hbase storagehandler.
Please refer here for more detailed explanation: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
If this is derived from a Hive table it has a schema so I would also consider the Hive / Phoenix storage handler:https://phoenix.apache.org/hive_storage_handler.html
On a performance standpoint overall querying Hbase through Hive should be less performant then querying ORC tables. This beeing said it depends on the query pattern and what the use case is.
regards
Created 12-13-2016 03:43 AM
Thanks @nmaillard And how about the ACID performance?
Created 12-13-2016 06:37 AM
our HDP 2.5's phoenix version is V4.7
Created 12-13-2016 04:01 AM
Created 12-13-2016 06:36 AM
@mqureshi, Thanks for your response.
We using sqoop data from oracle tables to HDFS( HIVE external table), and then insert into ORC table in HIVE to support data analytics. And our HIVE currently not turn ACID on. Most of tables size currently less than 1TBs. Now there is requirement to update the imported table data in HIVE, because of the source data updated. I seached on web and found it seems ACID are not very good on performance when update and the ACID tables are also not recognized outside of HIVE(e.g. SPARK). We are looking for a most performance approach for it. So I considered to implemented it by using hbase storagehandler or sqoop merge ?
Created 12-13-2016 03:54 PM
HBaseStoragehandler is what is required to read HBase tables. At the end of the day, you first have to create and manage HBase and then use Hive. Since, you are going to be doing updates, this might be the best way to go about it but I would strongly recommend to look at the following approach. The reason is probably my personal preference of not using HBase until required as it is complex and skill set required to successfully implement is difficult to find. That being said, in your use case, if you don't like the following approach, I'd prefer HBase over Hive ACID.
http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/