Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

unable to create unique partitions in hive

avatar
Contributor

I am using cloudera virtual box. while creating partitions, it is creating all the partitions whether they are unique or not

 

create table product_order1(id int,user_id int,amount int,product string, city string, txn_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
LOAD DATA LOCAL INPATH 'txn' INTO TABLE product_order1;
Loading data to table oct19.product_order1
Table oct19.product_order1 stats: [numFiles=1, totalSize=303]

OK Time taken: 0.426 seconds

hive> 
> set hive.exec.dynamic.partition = true;
hive> 
> set hive.exec.dynamic.partition.mode = true;

hive> 
> create table dyn_part(id int,user_id int,amount int,product string,city string) PARTITIONED BY(txn_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

OK Time taken: 0.14 seconds

hive >

INSERT OVERWRITE TABLE dyn_part PARTITION(txn_date) select id,user_id,amount,product,city,txn_date from product_order1;

Result which i have received :-

Loading data to table oct19.dyn_part partition (txn_date=null)
 Time taken for load dynamic partitions : 944
Loading partition {txn_date=04-02-2015}
Loading partition {txn_date= 03-04-2015}
Loading partition {txn_date=01-02-2015}
Loading partition {txn_date=03-04-2015}
Loading partition {txn_date= 01-01-2015}
Loading partition {txn_date=01-01-2015}
Loading partition {txn_date= 01-02-2015}
 Time taken for adding to write entity : 5

Partition oct19.dyn_part{txn_date= 01-01-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24] Partition oct19.dyn_part{txn_date= 01-02-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24] Partition oct19.dyn_part{txn_date= 03-04-2015} stats: [numFiles=1, numRows=2, totalSize=50, rawDataSize=48] Partition oct19.dyn_part{txn_date=01-01-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25] Partition oct19.dyn_part{txn_date=01-02-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25] Partition oct19.dyn_part{txn_date=03-04-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25] Partition oct19.dyn_part{txn_date=04-02-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24] MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.03 sec HDFS Read: 4166 HDFS Write: 614 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 30 msec

 

 

1 ACCEPTED SOLUTION

avatar
Guru
@priyanka1_munja,

Are you complaining that same partition appears multiple times? Did you notice the extra space before some of the partition keys? For example, "03-04-2015" vs " 03-04-2015"? I think that's the reason for the duplicates.

Cheers
Eric

View solution in original post

2 REPLIES 2

avatar
Guru
@priyanka1_munja,

Are you complaining that same partition appears multiple times? Did you notice the extra space before some of the partition keys? For example, "03-04-2015" vs " 03-04-2015"? I think that's the reason for the duplicates.

Cheers
Eric

avatar
Contributor

Yes i didn't notice that it has space. So i used trim for it.

Labels