<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Apache Hive - What kind of activities are normal in the Hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170045#M37291</link>
    <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/11031/m2014227.html" nodeid="11031"&gt;@Johnny Fugers&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hive is great for typically BI queries.  The scalability is limitless.  When you get into the area of updates, I rather do those activities on phoenix and serve the end results back to hive for BI queries.  Hive ACID is coming soon. Until that is available I would use the phoenix-&amp;gt;Hive route.  Use PIG for ETL.  Where it gets interested is using a MPP database on Hadoop. that is where HAWQ comes in.  It is a good low latency db engine which provided you some benefits from both hive and phoenix.  It does not do all hive &amp;amp; phoenix capabilities.  I would say it is a good happy medium.  I hope that helps.  When you go further into your journey you will start to ask question about security and governance.  For security you will start with ranger &amp;amp; Knox. and goverance you will start with falcon/atlas/ranger.&lt;/P&gt;</description>
    <pubDate>Tue, 09 Aug 2016 21:48:47 GMT</pubDate>
    <dc:creator>sunile_manjee</dc:creator>
    <dc:date>2016-08-09T21:48:47Z</dc:date>
    <item>
      <title>Apache Hive - What kind of activities are normal in the Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170043#M37289</link>
      <description>&lt;P&gt;Hi, 

I've been working with Hadoop and testing a lot of components of it ecossystem. Now I'm doing a small project that consists in two phases:
a) Data Cleansing
b) KPIs defintion

The step a) I already do in Apache PIG. Now I load the data to Apache Hive. And thus, as in all other projects that I work I only see Apache Hive as data repository. 

Basically, I just used the Hive to load the data after data cleansing step and the use it as regular data source, nothing more.
&lt;/P&gt;&lt;P&gt;Since 'm very new in Big Data/Hadoop world, I would like to know what kind of jobs/activities are normal to do using Apache Hive.

Sorry for the ignorance :)

Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 20:50:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170043#M37289</guid>
      <dc:creator>m2014227</dc:creator>
      <dc:date>2016-08-09T20:50:45Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Hive - What kind of activities are normal in the Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170044#M37290</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11031/m2014227.html" nodeid="11031"&gt;@Johnny Fugers&lt;/A&gt; &lt;/P&gt;&lt;P&gt;In many scenarios, Hive is used much like an RDBMS, but with better scalability and flexibility.  Hive scales to PB of data which is difficult for a typical RDMBS.&lt;/P&gt;&lt;P&gt;One of the big benefits Hive provides is a low barrier to entry for end-users.  More specifically, users are able to use standard SQL to interact with the data.  One of the most common use cases is to off-load many of the data processing tasks done in a typical RDMBS and have them done in Hive instead.  This frees up resources on those systems for more time sensitive tasks.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 21:15:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170044#M37290</guid>
      <dc:creator>myoung</dc:creator>
      <dc:date>2016-08-09T21:15:16Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Hive - What kind of activities are normal in the Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170045#M37291</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/11031/m2014227.html" nodeid="11031"&gt;@Johnny Fugers&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hive is great for typically BI queries.  The scalability is limitless.  When you get into the area of updates, I rather do those activities on phoenix and serve the end results back to hive for BI queries.  Hive ACID is coming soon. Until that is available I would use the phoenix-&amp;gt;Hive route.  Use PIG for ETL.  Where it gets interested is using a MPP database on Hadoop. that is where HAWQ comes in.  It is a good low latency db engine which provided you some benefits from both hive and phoenix.  It does not do all hive &amp;amp; phoenix capabilities.  I would say it is a good happy medium.  I hope that helps.  When you go further into your journey you will start to ask question about security and governance.  For security you will start with ranger &amp;amp; Knox. and goverance you will start with falcon/atlas/ranger.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 21:48:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-Hive-What-kind-of-activities-are-normal-in-the-Hive/m-p/170045#M37291</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-08-09T21:48:47Z</dc:date>
    </item>
  </channel>
</rss>

