<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Impala Partitioning by Name question in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53302#M59079</link>
    <description>&lt;P&gt;Wondering if there’s a better way to do this type of partitioning where I want to partition the data based on First Name stating with a-z and Last Name starting with a-z creating 676 buckets due to how the table is queried and volume.&amp;nbsp;&amp;nbsp; I wasn’t sure if there is someway to dynamically create the partition. What I did was create (in the example below I only left it to 3 partitions for the sake of typing) partitions based on the First letter of the names. Then select from my raw original table and inserted into the partition point based on the criteria. Is this the most efficient way of doing this when querying the table by firstname and lastname or are there better ways to partition this table or improve query performance?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE name_partition(&lt;/P&gt;&lt;P&gt;FName STRING,&lt;/P&gt;&lt;P&gt;LName STRING,&lt;/P&gt;&lt;P&gt;Notes String&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;PARTITIONED BY (FirstNameLetter STRING, LastNameLetter STRING)&lt;/P&gt;&lt;P&gt;STORED AS PARQUET&lt;/P&gt;&lt;P&gt;LOCATION '/data/shared_workspace/';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE name_partition&lt;/P&gt;&lt;P&gt;ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'a');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE name_partition&lt;/P&gt;&lt;P&gt;ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'b');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE name_partition&lt;/P&gt;&lt;P&gt;ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'c');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FROM name_raw&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'a')&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;FName,&lt;/P&gt;&lt;P&gt;Lname,&lt;/P&gt;&lt;P&gt;Notes&lt;/P&gt;&lt;P&gt;WHERE lower(Fname) like 'a%' AND lower(Lname)like 'a%'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'b')&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;FName,&lt;/P&gt;&lt;P&gt;Lname,&lt;/P&gt;&lt;P&gt;Notes&lt;/P&gt;&lt;P&gt;WHERE lower(Fname) like 'a%' AND lower(Lname)like 'b%';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'c')&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;FName,&lt;/P&gt;&lt;P&gt;Lname,&lt;/P&gt;&lt;P&gt;Notes&lt;/P&gt;&lt;P&gt;WHERE lower(Fname) like 'a%' AND lower(Lname)like 'c%';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 11:25:31 GMT</pubDate>
    <dc:creator>peterlandis</dc:creator>
    <dc:date>2022-09-16T11:25:31Z</dc:date>
    <item>
      <title>Impala Partitioning by Name question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53302#M59079</link>
      <description>&lt;P&gt;Wondering if there’s a better way to do this type of partitioning where I want to partition the data based on First Name stating with a-z and Last Name starting with a-z creating 676 buckets due to how the table is queried and volume.&amp;nbsp;&amp;nbsp; I wasn’t sure if there is someway to dynamically create the partition. What I did was create (in the example below I only left it to 3 partitions for the sake of typing) partitions based on the First letter of the names. Then select from my raw original table and inserted into the partition point based on the criteria. Is this the most efficient way of doing this when querying the table by firstname and lastname or are there better ways to partition this table or improve query performance?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CREATE EXTERNAL TABLE name_partition(&lt;/P&gt;&lt;P&gt;FName STRING,&lt;/P&gt;&lt;P&gt;LName STRING,&lt;/P&gt;&lt;P&gt;Notes String&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;PARTITIONED BY (FirstNameLetter STRING, LastNameLetter STRING)&lt;/P&gt;&lt;P&gt;STORED AS PARQUET&lt;/P&gt;&lt;P&gt;LOCATION '/data/shared_workspace/';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE name_partition&lt;/P&gt;&lt;P&gt;ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'a');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE name_partition&lt;/P&gt;&lt;P&gt;ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'b');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ALTER TABLE name_partition&lt;/P&gt;&lt;P&gt;ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'c');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FROM name_raw&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'a')&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;FName,&lt;/P&gt;&lt;P&gt;Lname,&lt;/P&gt;&lt;P&gt;Notes&lt;/P&gt;&lt;P&gt;WHERE lower(Fname) like 'a%' AND lower(Lname)like 'a%'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'b')&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;FName,&lt;/P&gt;&lt;P&gt;Lname,&lt;/P&gt;&lt;P&gt;Notes&lt;/P&gt;&lt;P&gt;WHERE lower(Fname) like 'a%' AND lower(Lname)like 'b%';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'c')&lt;/P&gt;&lt;P&gt;SELECT&lt;/P&gt;&lt;P&gt;FName,&lt;/P&gt;&lt;P&gt;Lname,&lt;/P&gt;&lt;P&gt;Notes&lt;/P&gt;&lt;P&gt;WHERE lower(Fname) like 'a%' AND lower(Lname)like 'c%';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:25:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53302#M59079</guid>
      <dc:creator>peterlandis</dc:creator>
      <dc:date>2022-09-16T11:25:31Z</dc:date>
    </item>
    <item>
      <title>Re: Impala Partitioning by Name question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53343#M59080</link>
      <description>&lt;P&gt;Dynamic Partitioning is an alternative to this per reading &lt;A href="https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_partitioning.html#partition_static_dynamic" target="_self"&gt;Cloudera Partitioning&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SET hive.exec.dynamic.partition.mode=nonstrict;&lt;/P&gt;&lt;P&gt;SET hive.exec.dynamic.partition=true;&lt;/P&gt;&lt;P&gt;INSERT OVERWRITE TABLE T PARTITION (FirstNameLetter, LastNameLetter) SELECT ...., lower(substr(name, 0, 1)), lower(substr(name, 1, 1))&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 15:02:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53343#M59080</guid>
      <dc:creator>peterlandis</dc:creator>
      <dc:date>2017-04-07T15:02:05Z</dc:date>
    </item>
    <item>
      <title>Re: Impala Partitioning by Name question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53670#M59081</link>
      <description>&lt;P&gt;To set this in Impala to execute either as a SQL file or hue you would set the variables as shown in the first 2 lines below. &amp;nbsp;Below is an example of using static partitioning and dynamic partitioning together where the EffDTM and FirstNameLetter are static partitioning and LastNameLetter uses dynamic partitioning.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;set var:hive.exec.dynamic.partition.mode=nonstrict;&lt;BR /&gt;set var:hive.exec.dynamic.partition=true;&lt;BR /&gt;INSERT OVERWRITE TABLE person_by_name PARTITION (EffDTM='201701', FirstNameLetter='a', LastNameLetter)&lt;BR /&gt;firstname,&lt;/P&gt;&lt;P&gt;lastname,&lt;/P&gt;&lt;P&gt;..&lt;/P&gt;&lt;P&gt;..&lt;/P&gt;&lt;P&gt;lower(substr(&lt;SPAN&gt;lastname&lt;/SPAN&gt;, 0, 1)) as LastNameLetter&lt;BR /&gt;FROM person&lt;/P&gt;&lt;P&gt;WHERE lower(substr(firstname, 0, 1)) = 'a';&lt;/P&gt;</description>
      <pubDate>Fri, 14 Apr 2017 14:01:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Impala-Partitioning-by-Name-question/m-p/53670#M59081</guid>
      <dc:creator>peterlandis</dc:creator>
      <dc:date>2017-04-14T14:01:52Z</dc:date>
    </item>
  </channel>
</rss>

