<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Data Wareshouse design in Hive in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391970#M247889</link>
    <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/111602"&gt;@APentyala&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;SPAN&gt;1. Data Modeling Design: Which model is best suited for a Lakehouse implementation, star schema or snowflake schema?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Ans: We don't have those designs or we are not aware of those&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. We are using CDP (Private) and need to implement updates and deletes (SCD Type 1 &amp;amp; 2). Are there any limitations with Hive external tables?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Ans: There are no limitations for EXTERNAL tables. Are you using HDFS or islon to store?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;3. Are there any pre-built dimension models or ER models available for reference?&lt;/P&gt;&lt;DIV class="UserSignature lia-message-signature"&gt;apentyala&lt;/DIV&gt;&lt;DIV class="UserSignature lia-message-signature"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="UserSignature lia-message-signature"&gt;Ans : We don't have any thing as such&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 19 Aug 2024 13:43:57 GMT</pubDate>
    <dc:creator>asish</dc:creator>
    <dc:date>2024-08-19T13:43:57Z</dc:date>
    <item>
      <title>Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391760#M247762</link>
      <description>&lt;P&gt;Hello Everyone,&lt;/P&gt;&lt;P&gt;We are developing a data lakehouse using Hive for the banking and financial sector. We would appreciate your insights on the following:&lt;/P&gt;&lt;P&gt;1. Which data modeling approach is recommended for this domain?&lt;BR /&gt;2. Are there any sample models available for reference?&lt;BR /&gt;3. What best practices should we follow to ensure data integrity and performance?&lt;BR /&gt;4. How can we efficiently manage large-scale data ingestion and processing?&lt;BR /&gt;5. Are there any specific challenges or pitfalls we should be aware of when implementing a lakehouse in this sector?&lt;/P&gt;&lt;P&gt;Your expertise and guidance would be greatly appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 06:27:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391760#M247762</guid>
      <dc:creator>APentyala</dc:creator>
      <dc:date>2026-04-21T06:27:13Z</dc:date>
    </item>
    <item>
      <title>Re: Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391761#M247763</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/111602"&gt;@APentyala&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our CDW experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/82698"&gt;@smruti&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/71090"&gt;@asish&lt;/a&gt;&amp;nbsp; who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Aug 2024 19:17:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391761#M247763</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2024-08-14T19:17:23Z</dc:date>
    </item>
    <item>
      <title>Re: Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391775#M247769</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/111602"&gt;@APentyala&lt;/a&gt;&amp;nbsp; Please find the answers below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Which data modeling approach is recommended for this domain?&lt;/P&gt;&lt;P&gt;Ans: If you have large data, we would recommend to go with Partitioning or multi-level partitioning. You could implement Bucketing if the data inside partition is large.&lt;/P&gt;&lt;P&gt;2. Are there any sample models available for reference?&lt;/P&gt;&lt;P&gt;Ans: You could take a refrence for partitioning and bucketing in &lt;A href="https://www.linkedin.com/pulse/what-partitioning-vs-bucketing-apache-hive-shrivastava/" target="_blank"&gt;https://www.linkedin.com/pulse/what-partitioning-vs-bucketing-apache-hive-shrivastava/&lt;/A&gt;&lt;BR /&gt;You could create a new table perfroom CTAS with Dynamic Partitiining from the existing table&lt;BR /&gt;Refrence: &lt;A href="https://www.geeksforgeeks.org/overview-of-dynamic-partition-in-hive/" target="_blank"&gt;https://www.geeksforgeeks.org/overview-of-dynamic-partition-in-hive/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3. What best practices should we follow to ensure data integrity and performance?&lt;/P&gt;&lt;P&gt;Ans: Please follow below best parctices:&lt;/P&gt;&lt;P&gt;a. Paartion and bucket it&lt;BR /&gt;b. You could use Iceberg table which would reduce the significant load on Metastore, if you are using CDP Public CLoud or CDP private CLoud(ECS/Opesnshit)&lt;BR /&gt;c. Use ORC/parquet&lt;BR /&gt;d. Use EXTERNAL tables,if you dont perfrom Update/Delete as reading External table is faster.&lt;/P&gt;&lt;P&gt;4. How can we efficiently manage large-scale data ingestion and processing?&lt;/P&gt;&lt;P&gt;Ans: The model follows as:&lt;BR /&gt;Kafka/Spark Streaming: Ingestion&lt;BR /&gt;Spark: Data Modelling&lt;BR /&gt;Hive: Warehosuing where you extract the data&lt;/P&gt;&lt;P&gt;Please. be specific on the use case.&lt;/P&gt;&lt;P&gt;5. Are there any specific challenges or pitfalls we should be aware of when implementing a lakehouse in this sector?&lt;/P&gt;&lt;P&gt;Ans: There should be no challenges, we would request to provide more briefing on this.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Aug 2024 04:08:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391775#M247769</guid>
      <dc:creator>asish</dc:creator>
      <dc:date>2024-08-15T04:08:04Z</dc:date>
    </item>
    <item>
      <title>Re: Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391809#M247786</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/111602"&gt;@APentyala&lt;/a&gt;&amp;nbsp;Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.&amp;nbsp; Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Aug 2024 19:55:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391809#M247786</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2024-08-16T19:55:45Z</dc:date>
    </item>
    <item>
      <title>Re: Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391871#M247813</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/71090"&gt;@asish&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your answers. Could you please provide more details on the following&lt;/P&gt;&lt;P&gt;1. Data Modeling Design: Which model is best suited for a Lakehouse implementation, star schema or snowflake schema?&lt;/P&gt;&lt;P&gt;2. We are using CDP (Private) and need to implement updates and deletes (SCD Type 1 &amp;amp; 2). Are there any limitations with Hive external tables?&lt;/P&gt;&lt;P&gt;3. Are there any pre-built dimension models or ER models available for reference?&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2024 18:54:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391871#M247813</guid>
      <dc:creator>APentyala</dc:creator>
      <dc:date>2024-08-18T18:54:43Z</dc:date>
    </item>
    <item>
      <title>Re: Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391970#M247889</link>
      <description>&lt;P&gt;hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/111602"&gt;@APentyala&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp;&lt;SPAN&gt;1. Data Modeling Design: Which model is best suited for a Lakehouse implementation, star schema or snowflake schema?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Ans: We don't have those designs or we are not aware of those&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2. We are using CDP (Private) and need to implement updates and deletes (SCD Type 1 &amp;amp; 2). Are there any limitations with Hive external tables?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Ans: There are no limitations for EXTERNAL tables. Are you using HDFS or islon to store?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;3. Are there any pre-built dimension models or ER models available for reference?&lt;/P&gt;&lt;DIV class="UserSignature lia-message-signature"&gt;apentyala&lt;/DIV&gt;&lt;DIV class="UserSignature lia-message-signature"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="UserSignature lia-message-signature"&gt;Ans : We don't have any thing as such&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2024 13:43:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391970#M247889</guid>
      <dc:creator>asish</dc:creator>
      <dc:date>2024-08-19T13:43:57Z</dc:date>
    </item>
    <item>
      <title>Re: Data Wareshouse design in Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391990#M247896</link>
      <description>&lt;P&gt;Yes, using HDFS to store, is there any limitation?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2024 20:17:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Data-Wareshouse-design-in-Hive/m-p/391990#M247896</guid>
      <dc:creator>APentyala</dc:creator>
      <dc:date>2024-08-19T20:17:13Z</dc:date>
    </item>
  </channel>
</rss>

