<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive and Hbase table in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148990#M111516</link>
    <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt; You can definitely upload data in hdfs and then in Hbase through Hive. You can also query Hbase through Hive using the hbase storagehandler. &lt;/P&gt;&lt;P&gt;Please refer here for more detailed explanation: &lt;A href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration"&gt;https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If this is derived from a Hive table it has a schema so I would also consider the Hive / Phoenix storage &lt;A href="https://phoenix.apache.org/hive_storage_handler.html"&gt;handler:https://phoenix.apache.org/hive_storage_handler.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;On a performance standpoint overall querying Hbase through Hive should be less performant then querying ORC tables. This beeing said it depends on the query pattern and what the use case is.&lt;/P&gt;&lt;P&gt;regards&lt;/P&gt;</description>
    <pubDate>Mon, 12 Dec 2016 20:48:02 GMT</pubDate>
    <dc:creator>nmaillard1</dc:creator>
    <dc:date>2016-12-12T20:48:02Z</dc:date>
    <item>
      <title>Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148989#M111515</link>
      <description>&lt;P&gt;Can I implement such scenario:&lt;/P&gt;&lt;P&gt;1.One data copy&lt;/P&gt;&lt;P&gt;2.UPDATE/DELETE/INSERT in Hbase &lt;/P&gt;&lt;P&gt;3.Query Table in Hive.&lt;/P&gt;&lt;P&gt;4.How about the performance of query in hive compare to ORC?&lt;/P&gt;&lt;P&gt;5.Or just turn on ACID in HIVE to implement above?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 20:40:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148989#M111515</guid>
      <dc:creator>diablo2</dc:creator>
      <dc:date>2016-12-12T20:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148990#M111516</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt; You can definitely upload data in hdfs and then in Hbase through Hive. You can also query Hbase through Hive using the hbase storagehandler. &lt;/P&gt;&lt;P&gt;Please refer here for more detailed explanation: &lt;A href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration"&gt;https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If this is derived from a Hive table it has a schema so I would also consider the Hive / Phoenix storage &lt;A href="https://phoenix.apache.org/hive_storage_handler.html"&gt;handler:https://phoenix.apache.org/hive_storage_handler.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;On a performance standpoint overall querying Hbase through Hive should be less performant then querying ORC tables. This beeing said it depends on the query pattern and what the use case is.&lt;/P&gt;&lt;P&gt;regards&lt;/P&gt;</description>
      <pubDate>Mon, 12 Dec 2016 20:48:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148990#M111516</guid>
      <dc:creator>nmaillard1</dc:creator>
      <dc:date>2016-12-12T20:48:02Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148991#M111517</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/131/nmaillard.html" nodeid="131"&gt;@nmaillard&lt;/A&gt;   And how about the ACID performance?&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2016 11:43:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148991#M111517</guid>
      <dc:creator>diablo2</dc:creator>
      <dc:date>2016-12-13T11:43:13Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148992#M111518</link>
      <description>&lt;DIV&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13286/huangdengke.html" nodeid="13286"&gt;@Huahua Wei&lt;/A&gt; &lt;/DIV&gt;&lt;DIV&gt;What is your use case? Type of data? Hive Acid performance will likely be slower than Hive on top of HBase specifically if you access data using HBase row key.&lt;/DIV&gt;&lt;DIV&gt;Before I recommend Hive/ORC vs HBase, I'd like to understand your use case better. Here is what I say about HBase:
&lt;/DIV&gt;&lt;DIV&gt;When to use HBase:&lt;/DIV&gt;&lt;DIV&gt;•Storing
large amounts of data (TB/PB) &lt;/DIV&gt;&lt;DIV&gt;•High
throughput for a large number of requests &lt;/DIV&gt;&lt;DIV&gt;•Storing
unstructured or variable column data &lt;/DIV&gt;&lt;DIV&gt;•Big
Data with random read and writes&lt;/DIV&gt;&lt;DIV&gt; •Well
Suited for sparse rows where the number of column varies &lt;/DIV&gt;&lt;DIV&gt;•Highly
Available, Scalable (since it runs on HDFS)
&lt;/DIV&gt;&lt;DIV&gt;When NOT to use HBase:

•Only
use with Big Data problems &lt;/DIV&gt;&lt;DIV&gt;•If you have data for only one or
two nodes, HBase is likely not the tool you should be using to begin with. &lt;/DIV&gt;&lt;DIV&gt;•Read
straight through files &lt;/DIV&gt;&lt;DIV&gt;•Write
all at once or append new files &lt;/DIV&gt;&lt;DIV&gt;•Not random reads or writes &lt;/DIV&gt;&lt;DIV&gt;•Access
patterns of the data are ill-defined&lt;/DIV&gt;</description>
      <pubDate>Tue, 13 Dec 2016 12:01:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148992#M111518</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2016-12-13T12:01:04Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148993#M111519</link>
      <description>&lt;P&gt;@mqureshi, Thanks for your response.&lt;/P&gt;&lt;P&gt;We using sqoop data from oracle tables to HDFS( HIVE external table), and then insert into ORC table in HIVE to support data analytics. And our HIVE currently not turn ACID on. Most of tables size currently less than 1TBs. Now there is requirement to update the imported table data in HIVE, because of the source data updated. I seached on web and found it seems ACID are not very good on performance when update and the ACID tables are also not recognized outside of HIVE(e.g. SPARK). We are looking for a most performance approach for it. So I considered to implemented it by using &lt;STRONG&gt;hbase storagehandler&lt;/STRONG&gt; or sqoop merge ? &lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2016 14:36:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148993#M111519</guid>
      <dc:creator>diablo2</dc:creator>
      <dc:date>2016-12-13T14:36:19Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148994#M111520</link>
      <description>&lt;P&gt;our HDP 2.5's phoenix version is V4.7  &lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2016 14:37:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148994#M111520</guid>
      <dc:creator>diablo2</dc:creator>
      <dc:date>2016-12-13T14:37:32Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and Hbase table</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148995#M111521</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13286/huangdengke.html" nodeid="13286"&gt;@Huahua Wei&lt;/A&gt; &lt;/P&gt;&lt;P&gt;HBaseStoragehandler is what is required to read HBase tables. At the end of the day, you first have to create and manage HBase and then use Hive. Since, you are going to be doing updates, this might be the best way to go about it but I would strongly recommend to look at the following approach. The reason is probably my personal preference of not using HBase until required as it is complex and skill set required to successfully implement is difficult to find. That being said, in your use case, if you don't like the following approach, I'd prefer HBase over Hive ACID.&lt;/P&gt;&lt;P&gt;&lt;A href="http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/" target="_blank"&gt;http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2016 23:54:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-Hbase-table/m-p/148995#M111521</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2016-12-13T23:54:38Z</dc:date>
    </item>
  </channel>
</rss>

