Support Questions

Find answers, ask questions, and share your expertise

unable to create unique partitions in hive

avatar

I am using cloudera virtual box. while creating partitions, it is creating all the partitions whether they are unique or not

 

create table product_order1(id int,user_id int,amount int,product string, city string, txn_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
LOAD DATA LOCAL INPATH 'txn' INTO TABLE product_order1;
Loading data to table oct19.product_order1
Table oct19.product_order1 stats: [numFiles=1, totalSize=303]

OK Time taken: 0.426 seconds

hive> 
> set hive.exec.dynamic.partition = true;
hive> 
> set hive.exec.dynamic.partition.mode = true;

hive> 
> create table dyn_part(id int,user_id int,amount int,product string,city string) PARTITIONED BY(txn_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

OK Time taken: 0.14 seconds

hive >

INSERT OVERWRITE TABLE dyn_part PARTITION(txn_date) select id,user_id,amount,product,city,txn_date from product_order1;

Result which i have received :-

Loading data to table oct19.dyn_part partition (txn_date=null)
 Time taken for load dynamic partitions : 944
Loading partition {txn_date=04-02-2015}
Loading partition {txn_date= 03-04-2015}
Loading partition {txn_date=01-02-2015}
Loading partition {txn_date=03-04-2015}
Loading partition {txn_date= 01-01-2015}
Loading partition {txn_date=01-01-2015}
Loading partition {txn_date= 01-02-2015}
 Time taken for adding to write entity : 5

Partition oct19.dyn_part{txn_date= 01-01-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24] Partition oct19.dyn_part{txn_date= 01-02-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24] Partition oct19.dyn_part{txn_date= 03-04-2015} stats: [numFiles=1, numRows=2, totalSize=50, rawDataSize=48] Partition oct19.dyn_part{txn_date=01-01-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25] Partition oct19.dyn_part{txn_date=01-02-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25] Partition oct19.dyn_part{txn_date=03-04-2015} stats: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25] Partition oct19.dyn_part{txn_date=04-02-2015} stats: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24] MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.03 sec HDFS Read: 4166 HDFS Write: 614 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 30 msec

 

 

1 ACCEPTED SOLUTION

avatar
Super Guru
@priyanka1_munja,

Are you complaining that same partition appears multiple times? Did you notice the extra space before some of the partition keys? For example, "03-04-2015" vs " 03-04-2015"? I think that's the reason for the duplicates.

Cheers
Eric

View solution in original post

2 REPLIES 2

avatar
Super Guru
@priyanka1_munja,

Are you complaining that same partition appears multiple times? Did you notice the extra space before some of the partition keys? For example, "03-04-2015" vs " 03-04-2015"? I think that's the reason for the duplicates.

Cheers
Eric

avatar

Yes i didn't notice that it has space. So i used trim for it.