Created 07-29-2016 07:51 AM
I am literally confuse that do we need to create a normal table before creting a partition table , help is appreciated
Created 07-29-2016 08:01 AM
We can create partition on both External as well as Managed tables. Yes we need to define partition before creating the tables.
More on performance related go to below link
See below example of a partitionon External table.
CREATE EXTERNAL TABLE `myTable`( `ossc_rc` string, `subnetwork1` string) PARTITIONED BY ( `part` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://location/ready' TBLPROPERTIES ( 'transient_lastDdlTime'='1433520068') ;
Created 07-29-2016 08:01 AM
We can create partition on both External as well as Managed tables. Yes we need to define partition before creating the tables.
More on performance related go to below link
See below example of a partitionon External table.
CREATE EXTERNAL TABLE `myTable`( `ossc_rc` string, `subnetwork1` string) PARTITIONED BY ( `part` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://location/ready' TBLPROPERTIES ( 'transient_lastDdlTime'='1433520068') ;
Created 07-29-2016 08:18 AM
HI Mukesh ,
thanks for the reply but my question here is I am explaining in steps The first step for me is to create a managed partiton table like CREATE TABLE `myTable`( `ossc_rc` string, `subnetwork1` string) PARTITIONED BY ( `part` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
now my step is to load the partition table right!
my text input file is mytable.txt ossc_rc part subnetwork
1 cds 34r
4 fe 964
5 cds 35r
4 cds 36r
please remember that my 2nd column in the file is part which you have taken for partition
now i have to load data into the managed partition table and the statement will be
LOAD DATA LOCAL INPATH '/HOME/HIMANSHU/MYTABLE.TXT' PARTITION (par ='cds');
now here i m getting confused while loading data into the table how it will know that which column in my file contains cds and which column actually it is
do we need to keep my partition column in the last column?
Created 07-29-2016 09:20 AM
Yes, Because it is required to map schema, the last column in file is partitioned column. But if you are loading from another table then in select statement keep your partitioned column last.
Created 07-29-2016 09:36 AM
Thanks mukesh
But can I knw what if In my textfile (20 columns ) that is comma separated in which i want to partition by a columns 3rd and 4th and I just want to directly create a managed partition table now because the file size is too long that i cannot bring down my third, fourth column to the last .In such situation what is the best alternative
Created 07-29-2016 11:12 AM
Hi @Himanshu Rawat There are two approach for this problem.
1. Create partition wise separate files using unix or any other tool and load them on individually in static partitions like below:-
ALTER TABLE Unm_Parti ADD PARTITION (Department='A') location '/user/mukesh/HIVE/HiveTrailFolder/A';
ALTER TABLE Unm_Parti ADD PARTITION (Department='B') location '/user/mukesh/HIVE/HiveTrailFolder/B';
ALTER TABLE Unm_Parti ADD PARTITION (Department='C') location '/user/mukesh/HIVE/HiveTrailFolder/C';
2. Create external table and put file into external table HDFS location, we can call it as staging table.
Now create final partition table and load it using dynamic partition enable:-
1. set hive.exec.dynamic.partition=true This enable dynamic partitions, by default it is false. 2. set hive.exec.dynamic.partition.mode=nonstrict We are using the dynamic partition without a static partition (A table can be partitioned based on multiple columns in hive) in such case we have to enable the non strict mode. In strict mode we can use dynamic partition only with a Static Partition.
Now use below statement to load data:-
INSERT OVERWRITE TABLE Final_Table PARTITION(c2) SELECT c1, c4,c3,c2 FROM stage_table;
Created 07-29-2016 11:56 AM
can i get your number so that i can clear the concept of partitioning
Help appreciated
Mukesh