Support Questions

Find answers, ask questions, and share your expertise

Do we need to create a normal managed table before we make a partition table in hive

avatar

I am literally confuse that do we need to create a normal table before creting a partition table , help is appreciated

1 ACCEPTED SOLUTION

avatar

We can create partition on both External as well as Managed tables. Yes we need to define partition before creating the tables.

More on performance related go to below link

https://community.hortonworks.com/questions/15161/can-we-apply-the-partitioning-on-the-already-exist...

See below example of a partitionon External table.

CREATE EXTERNAL TABLE `myTable`( `ossc_rc` string, `subnetwork1` string) PARTITIONED BY ( `part` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://location/ready' TBLPROPERTIES ( 'transient_lastDdlTime'='1433520068') ;

View solution in original post

6 REPLIES 6

avatar

We can create partition on both External as well as Managed tables. Yes we need to define partition before creating the tables.

More on performance related go to below link

https://community.hortonworks.com/questions/15161/can-we-apply-the-partitioning-on-the-already-exist...

See below example of a partitionon External table.

CREATE EXTERNAL TABLE `myTable`( `ossc_rc` string, `subnetwork1` string) PARTITIONED BY ( `part` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://location/ready' TBLPROPERTIES ( 'transient_lastDdlTime'='1433520068') ;

avatar

HI Mukesh ,

thanks for the reply but my question here is I am explaining in steps The first step for me is to create a managed partiton table like CREATE TABLE `myTable`( `ossc_rc` string, `subnetwork1` string) PARTITIONED BY ( `part` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

now my step is to load the partition table right!

my text input file is mytable.txt ossc_rc part subnetwork

1 cds 34r

4 fe 964

5 cds 35r

4 cds 36r

please remember that my 2nd column in the file is part which you have taken for partition

now i have to load data into the managed partition table and the statement will be

LOAD DATA LOCAL INPATH '/HOME/HIMANSHU/MYTABLE.TXT' PARTITION (par ='cds');

now here i m getting confused while loading data into the table how it will know that which column in my file contains cds and which column actually it is

do we need to keep my partition column in the last column?

avatar

Yes, Because it is required to map schema, the last column in file is partitioned column. But if you are loading from another table then in select statement keep your partitioned column last.

avatar

Thanks mukesh

But can I knw what if In my textfile (20 columns ) that is comma separated in which i want to partition by a columns 3rd and 4th and I just want to directly create a managed partition table now because the file size is too long that i cannot bring down my third, fourth column to the last .In such situation what is the best alternative

avatar

Hi @Himanshu Rawat There are two approach for this problem.

1. Create partition wise separate files using unix or any other tool and load them on individually in static partitions like below:-

ALTER TABLE Unm_Parti ADD PARTITION (Department='A')
location '/user/mukesh/HIVE/HiveTrailFolder/A';
ALTER TABLE Unm_Parti ADD PARTITION (Department='B')
location '/user/mukesh/HIVE/HiveTrailFolder/B';
ALTER TABLE Unm_Parti ADD PARTITION (Department='C')
location '/user/mukesh/HIVE/HiveTrailFolder/C';

2. Create external table and put file into external table HDFS location, we can call it as staging table.

Now create final partition table and load it using dynamic partition enable:-

1. set hive.exec.dynamic.partition=true
     This enable dynamic partitions, by default it is false.
2. set hive.exec.dynamic.partition.mode=nonstrict
     We are using the dynamic partition without a static
     partition (A table can be partitioned based    
     on multiple columns in hive) in such case we have to           
     enable the non strict mode. In strict mode we can use             
     dynamic partition  only with a Static Partition.

Now use below statement to load data:-

INSERT OVERWRITE TABLE Final_Table PARTITION(c2) SELECT c1, c4,c3,c2 FROM stage_table; 

avatar

can i get your number so that i can clear the concept of partitioning

Help appreciated

Mukesh