Reply
New Contributor
Posts: 9
Registered: ‎03-06-2017
Accepted Solution

Impala Partitioning by Name question

Wondering if there’s a better way to do this type of partitioning where I want to partition the data based on First Name stating with a-z and Last Name starting with a-z creating 676 buckets due to how the table is queried and volume.   I wasn’t sure if there is someway to dynamically create the partition. What I did was create (in the example below I only left it to 3 partitions for the sake of typing) partitions based on the First letter of the names. Then select from my raw original table and inserted into the partition point based on the criteria. Is this the most efficient way of doing this when querying the table by firstname and lastname or are there better ways to partition this table or improve query performance?

 

Example:

 

CREATE EXTERNAL TABLE name_partition(

FName STRING,

LName STRING,

Notes String

)

PARTITIONED BY (FirstNameLetter STRING, LastNameLetter STRING)

STORED AS PARQUET

LOCATION '/data/shared_workspace/';

 

ALTER TABLE name_partition

ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'a');

 

ALTER TABLE name_partition

ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'b');

 

ALTER TABLE name_partition

ADD PARTITION (FirstNameLetter = 'a', LastNameLetter = 'c');

 

 

FROM name_raw

INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'a')

SELECT

FName,

Lname,

Notes

WHERE lower(Fname) like 'a%' AND lower(Lname)like 'a%'

 

INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'b')

SELECT

FName,

Lname,

Notes

WHERE lower(Fname) like 'a%' AND lower(Lname)like 'b%';

 

INSERT OVERWRITE TABLE name_partition PARTITION(FirstNameLetter ='a', LastNameLetter = 'c')

SELECT

FName,

Lname,

Notes

WHERE lower(Fname) like 'a%' AND lower(Lname)like 'c%';

 

 

 

 

Highlighted
New Contributor
Posts: 9
Registered: ‎03-06-2017

Re: Impala Partitioning by Name question

Dynamic Partitioning is an alternative to this per reading Cloudera Partitioning

 

SET hive.exec.dynamic.partition.mode=nonstrict;

SET hive.exec.dynamic.partition=true;

INSERT OVERWRITE TABLE T PARTITION (FirstNameLetter, LastNameLetter) SELECT ...., lower(substr(name, 0, 1)), lower(substr(name, 1, 1))

New Contributor
Posts: 9
Registered: ‎03-06-2017

Re: Impala Partitioning by Name question

To set this in Impala to execute either as a SQL file or hue you would set the variables as shown in the first 2 lines below.  Below is an example of using static partitioning and dynamic partitioning together where the EffDTM and FirstNameLetter are static partitioning and LastNameLetter uses dynamic partitioning.

 

set var:hive.exec.dynamic.partition.mode=nonstrict;
set var:hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE person_by_name PARTITION (EffDTM='201701', FirstNameLetter='a', LastNameLetter)
firstname,

lastname,

..

..

lower(substr(lastname, 0, 1)) as LastNameLetter
FROM person

WHERE lower(substr(firstname, 0, 1)) = 'a';

Announcements