<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to load existing partitoned parquet data in hive table from S3 bucket ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/363501#M239004</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103361"&gt;@codiste_m&lt;/a&gt;&amp;nbsp; By default hive will be using Static Partitioning.&amp;nbsp; &amp;nbsp;With Hive you can do Dynamic Partitioning, but i am not sure how well that works with existing data in existing folders.&amp;nbsp; I believe this creates the correct partitions based on the schema, and is creating those partition folders as the data inserts into the storage path.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It sounds like you will need to execute a load data command for all partitions you want to query.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 09 Feb 2023 14:14:36 GMT</pubDate>
    <dc:creator>steven-matison</dc:creator>
    <dc:date>2023-02-09T14:14:36Z</dc:date>
    <item>
      <title>How to load existing partitoned parquet data in hive table from S3 bucket ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/363358#M238966</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Currently i am having a setup where i am already having partitioned parquet data in my s3 bucket, which i want to dynamically bind with hive table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am able to achieve it for single partition, but i need help loading data from all the partitions in the table from existing partitioned parquet data from s3 bucket.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Feb 2023 07:18:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/363358#M238966</guid>
      <dc:creator>codiste_m</dc:creator>
      <dc:date>2023-02-08T07:18:21Z</dc:date>
    </item>
    <item>
      <title>Re: How to load existing partitoned parquet data in hive table from S3 bucket ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/363501#M239004</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103361"&gt;@codiste_m&lt;/a&gt;&amp;nbsp; By default hive will be using Static Partitioning.&amp;nbsp; &amp;nbsp;With Hive you can do Dynamic Partitioning, but i am not sure how well that works with existing data in existing folders.&amp;nbsp; I believe this creates the correct partitions based on the schema, and is creating those partition folders as the data inserts into the storage path.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It sounds like you will need to execute a load data command for all partitions you want to query.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Feb 2023 14:14:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/363501#M239004</guid>
      <dc:creator>steven-matison</dc:creator>
      <dc:date>2023-02-09T14:14:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to load existing partitoned parquet data in hive table from S3 bucket ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/376951#M243078</link>
      <description>&lt;P&gt;if the partition data exists like below:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;lt;s3:bucket&amp;gt;/&amp;lt;some_location&amp;gt;/&amp;lt;part_column&amp;gt;=&amp;lt;part_value&amp;gt;/&amp;lt;filename&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;you can create a external table by specifiying above location and run 'msck repair table &amp;lt;table_name&amp;gt; sync partitions' to sync partitions. validate the data by running some sample select statements.&lt;/P&gt;&lt;P&gt;Once it's done you can create new external table with another bucket and run insert statement with dynamic partition.&lt;/P&gt;&lt;P&gt;Ref - &lt;A href="https://cwiki.apache.org/confluence/display/hive/dynamicpartitions" target="_blank"&gt;https://cwiki.apache.org/confluence/display/hive/dynamicpartitions&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 14:43:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-load-existing-partitoned-parquet-data-in-hive-table/m-p/376951#M243078</guid>
      <dc:creator>ggangadharan</dc:creator>
      <dc:date>2023-09-27T14:43:12Z</dc:date>
    </item>
  </channel>
</rss>

