<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive . in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224626#M84609</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/98160/lenu.html" nodeid="98160"&gt;@Lenu K&lt;/A&gt;&lt;P&gt;We can do export to Hive ORC as follows:&lt;/P&gt;&lt;PRE&gt;hive&amp;gt; Create table &amp;lt;db_name&amp;gt;.&amp;lt;orc_table_name&amp;gt; stored as orc as select * from &amp;lt;db_name&amp;gt;.&amp;lt;hbase_hive_table&amp;gt;;&lt;/PRE&gt;&lt;P&gt;The above &lt;A href="https://www.dummies.com/programming/big-data/hadoop/how-to-use-hives-create-table-as-select-ctas/" target="_blank"&gt;CTAS&lt;/A&gt; is generic statement even you can create a partitioned table (or) use distribute by sort by to create files in the directories.&lt;/P&gt;</description>
    <pubDate>Sun, 28 Oct 2018 23:29:49 GMT</pubDate>
    <dc:creator>Shu_ashu</dc:creator>
    <dc:date>2018-10-28T23:29:49Z</dc:date>
    <item>
      <title>Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive .</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224623#M84606</link>
      <description>&lt;P&gt;Hello Everybody,&lt;/P&gt;&lt;P&gt;We have HBASE Table with around 10 Million records and when we integrate with Hive . It is taking more than 30 minutes to produce the results . If we try to do it in HBase it is fast. Is there anyway to manage the situation . &lt;/P&gt;&lt;P&gt;Or&lt;/P&gt;&lt;P&gt;1.can I export all the data from HBase to Hive&lt;/P&gt;&lt;P&gt;2.How can we avoid full scan in HBase tables from Hive . &lt;/P&gt;&lt;P&gt;Sorry for the basic questions&lt;/P&gt;</description>
      <pubDate>Sun, 28 Oct 2018 14:28:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224623#M84606</guid>
      <dc:creator>lenu</dc:creator>
      <dc:date>2018-10-28T14:28:57Z</dc:date>
    </item>
    <item>
      <title>Re: Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive .</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224624#M84607</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/98160/lenu.html" nodeid="98160"&gt;@Lenu K&lt;/A&gt;&lt;P&gt;One way to &lt;STRONG&gt;avoid full table scans is by using RowKey&lt;/STRONG&gt; in your &lt;STRONG&gt;hive filter query&lt;/STRONG&gt; and if you are filtering out another columns(not  only row key) then it would be &lt;STRONG&gt;a lot more efficient if you export all HBase table data into Hive-ORC table&lt;/STRONG&gt; then run all your queries on the exported table.&lt;/P&gt;&lt;P&gt;Refer to &lt;A href="https://www.linkedin.com/pulse/performance-tuning-hbase-part-1-rowkey-crux-kuldeep-deshpande/" target="_blank"&gt;this&lt;/A&gt; and &lt;A href="https://stackoverflow.com/questions/30074734/tuning-hive-queries-that-uses-underlying-hbase-table" target="_blank"&gt;this&lt;/A&gt; links for tuning up the Queries in case of HBase-Hive table.&lt;/P&gt;&lt;P&gt;-&lt;/P&gt;&lt;P&gt;If the Answer helped to resolve your issue, &lt;STRONG&gt;Click on Accept button below to accept the answer.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 28 Oct 2018 20:52:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224624#M84607</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-10-28T20:52:47Z</dc:date>
    </item>
    <item>
      <title>Re: Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive .</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224625#M84608</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt;&lt;P&gt; Do you have steps to export the HBASE to Hive ORC table. I have tried the performance tuning already it didnt come up properly. Thank you very much helping &lt;/P&gt;</description>
      <pubDate>Sun, 28 Oct 2018 22:24:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224625#M84608</guid>
      <dc:creator>lenu</dc:creator>
      <dc:date>2018-10-28T22:24:16Z</dc:date>
    </item>
    <item>
      <title>Re: Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive .</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224626#M84609</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/98160/lenu.html" nodeid="98160"&gt;@Lenu K&lt;/A&gt;&lt;P&gt;We can do export to Hive ORC as follows:&lt;/P&gt;&lt;PRE&gt;hive&amp;gt; Create table &amp;lt;db_name&amp;gt;.&amp;lt;orc_table_name&amp;gt; stored as orc as select * from &amp;lt;db_name&amp;gt;.&amp;lt;hbase_hive_table&amp;gt;;&lt;/PRE&gt;&lt;P&gt;The above &lt;A href="https://www.dummies.com/programming/big-data/hadoop/how-to-use-hives-create-table-as-select-ctas/" target="_blank"&gt;CTAS&lt;/A&gt; is generic statement even you can create a partitioned table (or) use distribute by sort by to create files in the directories.&lt;/P&gt;</description>
      <pubDate>Sun, 28 Oct 2018 23:29:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224626#M84609</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-10-28T23:29:49Z</dc:date>
    </item>
    <item>
      <title>Re: Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive .</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224627#M84610</link>
      <description>&lt;P&gt;Simple and Cool. However the table is updated every other hour . It is taking very long time for 900GB to CTAS. The thing is to store TB of data for the first time and then 100GB daily incremental like insert/update/delete in HBase and to make it available for Business analysts . It is taking more than 40+ minutes to retrieve a single query. Loading the data in HBASE takes only 10 to 20 minutes.Any other approach Shu.Kindly give me some spark &lt;/P&gt;</description>
      <pubDate>Sun, 28 Oct 2018 23:45:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224627#M84610</guid>
      <dc:creator>lenu</dc:creator>
      <dc:date>2018-10-28T23:45:15Z</dc:date>
    </item>
    <item>
      <title>Re: Underlying HBASE Table is taking 30+ minutes for small queries triggered from hive .</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224628#M84611</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/98160/lenu.html" nodeid="98160"&gt;@Lenu K&lt;/A&gt;&lt;P&gt;&lt;STRONG&gt;1.Using Spark-Hbase Connector:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;You can use &lt;A href="https://github.com/hortonworks-spark/shc" target="_blank"&gt;Spark-Hbase&lt;/A&gt; connector to get data from Hbase table using Spark and store until what time you have pulled of records from the HBase table.&lt;/P&gt;&lt;P&gt;For the next run get the state and use it as lower bound and current time as upper bound pull the data from Hbase table and insert into Hive table.&lt;/P&gt;&lt;P&gt;By using this way we are not creating full snapshot of HBase table as Hive orc table instead we are incrementally loading the data into hive table and use hive table data for analytics.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2.Using Hive Merge strategy:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;You can use Hive Merge strategy introduced in HDP-2.6 but for this case your hive table needs to be Transactional enabled.&lt;/P&gt;&lt;PRE&gt;merge into transactional_table using &amp;lt;hbase_hive_table&amp;gt;... etc&lt;/PRE&gt;&lt;P&gt;for more details refer to &lt;A href="https://community.hortonworks.com/articles/97113/hive-acid-merge-by-example.html" target="_blank"&gt;this&lt;/A&gt; link.&lt;/P&gt;&lt;P&gt;another way using hive would be using CTAS as mentioned above in comments for the first run it will take more time but from the 2 run you can only pull the incremental records from HBase table and load into Hive orc table(if you are following this approach then using &lt;STRONG&gt;spark-hbase connector&lt;/STRONG&gt; will give more performence.)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3.Using Apache-Phoenix:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Using &lt;A href="https://www.slideshare.net/HadoopSummit/apache-phoenix-apache-hbase" target="_blank"&gt;Apache phoenix&lt;/A&gt; to get the data from &lt;A href="https://phoenix.apache.org/faq.html#How_I_map_Phoenix_table_to_an_existing_HBase_table" target="_blank"&gt;HBase table as Phoenix table&lt;/A&gt; will be pointed to HBase table and allows to run sql queries on top of HBase stored data.&lt;/P&gt;&lt;P&gt;Difference between &lt;A href="https://www.quora.com/How-is-Apache-Phoenix-different-from-Hive-Hbase-integration" target="_blank"&gt;Hive-Hbase integration vs Phoenix-Hbase integration&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Oct 2018 01:17:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Underlying-HBASE-Table-is-taking-30-minutes-for-small/m-p/224628#M84611</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2018-10-29T01:17:42Z</dc:date>
    </item>
  </channel>
</rss>

