Created on 04-06-2017 03:19 PM - edited 09-16-2022 04:25 AM
Wondering if there’s a better way to do this type of partitioning where I want to partition the data based on First Name stating with a-z and Last Name starting with a-z creating 676 buckets due to how the table is queried and volume. I wasn’t sure if there is someway to dynamically create the partition. What I did was create (in the example below I only left it to 3 partitions for the sake of typing) partitions based on the First letter of the names. Then select from my raw original table and inserted into the partition point based on the criteria. Is this the most efficient way of doing this when querying the table by firstname and lastname or are there better ways to partition this table or improve query performance?
Example:
CREATE EXTERNAL TABLE name_partition(
FName STRING,
LName STRING,
Notes String
)
PARTITIONED BY (FirstNameLetter STRING, LastNameLetter STRING)
STORED AS PARQUET
LOCATION '/data/shared_workspace/';
ALTER TABLE name_partition
ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'a');
ALTER TABLE name_partition
ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'b');
ALTER TABLE name_partition
ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'c');
FROM name_raw
INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'a')
SELECT
FName,
Lname,
Notes
WHERE lower(Fname) like 'a%' AND lower(Lname)like 'a%'
INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'b')
SELECT
FName,
Lname,
Notes
WHERE lower(Fname) like 'a%' AND lower(Lname)like 'b%';
INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'c')
SELECT
FName,
Lname,
Notes
WHERE lower(Fname) like 'a%' AND lower(Lname)like 'c%';
Created 04-07-2017 08:02 AM
Dynamic Partitioning is an alternative to this per reading Cloudera Partitioning
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE T PARTITION (FirstNameLetter, LastNameLetter) SELECT ...., lower(substr(name, 0, 1)), lower(substr(name, 1, 1))
Created 04-14-2017 07:01 AM
To set this in Impala to execute either as a SQL file or hue you would set the variables as shown in the first 2 lines below. Below is an example of using static partitioning and dynamic partitioning together where the EffDTM and FirstNameLetter are static partitioning and LastNameLetter uses dynamic partitioning.
set var:hive.exec.dynamic.partition.mode=nonstrict;
set var:hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE person_by_name PARTITION (EffDTM='201701', FirstNameLetter='a', LastNameLetter)
firstname,
lastname,
..
..
lower(substr(lastname, 0, 1)) as LastNameLetter
FROM person
WHERE lower(substr(firstname, 0, 1)) = 'a';
Created 04-07-2017 08:02 AM
Dynamic Partitioning is an alternative to this per reading Cloudera Partitioning
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE T PARTITION (FirstNameLetter, LastNameLetter) SELECT ...., lower(substr(name, 0, 1)), lower(substr(name, 1, 1))
Created 04-14-2017 07:01 AM
To set this in Impala to execute either as a SQL file or hue you would set the variables as shown in the first 2 lines below. Below is an example of using static partitioning and dynamic partitioning together where the EffDTM and FirstNameLetter are static partitioning and LastNameLetter uses dynamic partitioning.
set var:hive.exec.dynamic.partition.mode=nonstrict;
set var:hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE person_by_name PARTITION (EffDTM='201701', FirstNameLetter='a', LastNameLetter)
firstname,
lastname,
..
..
lower(substr(lastname, 0, 1)) as LastNameLetter
FROM person
WHERE lower(substr(firstname, 0, 1)) = 'a';