Reply
New Contributor
Posts: 1
Registered: ‎10-10-2018

Sqoop import to partitioned hive table

Hi All, I want to create a simple hive partitioned table and have a sqoop import command to populate it.

1.Table have say 4 columns, ID, col1, col2, col3.

2. One of the column say col2 is int type and contains values 1 to 10 only.

3. I need to partition table based on col2 column with 1 to 5 value data should be in one partition and rest in another.

4. Once done i need to populate this table with sqoop import from my sql server.  

 

I have tried many ways but not able to do it. Can anyone please help?

Champion
Posts: 753
Registered: ‎05-16-2016

Re: Sqoop import to partitioned hive table

@SqlEnthu

 

Could you refer the below 

CREATE TABLE temps_orc_partition_date
(col1 string , col2 int )
PARTITIONED BY (col2=1 )
STORED AS ORC;



for sqoop import you can use the below param 

hive-partition-key col2\

--hive-partition-value 1 
Explorer
Posts: 30
Registered: ‎05-18-2017

Re: Sqoop import to partitioned hive table

hello, @SqlEnthu

Below i am providing the solution, might be it would help you.

'sai',1
'ven',2
'krish',3
'ram',4
'rajesh',5
'eas',6
'res',7
'resd',8
'dasd',9
'dadsd',10

hive> use datakare;
OK
Time taken: 0.07 seconds
hive> create table part(name string,no int)
    > row format delimited 
    > fields terminated by ',';
OK
Time taken: 4.803 seconds
hive> load data local inpath 'Desktop/ntile' into table part;

hive> select name,no,ntile(2) over (order by no ASC) from part;

OK
'sai'	1	1
'ven'	2	1
'krish'	3	1
'ram'	4	1
'rajesh'	5	1
'eas'	6	2
'res'	7	2
'resd'	8	2
'dasd'	9	2
'dadsd'	10	2
Time taken: 139.398 seconds, Fetched: 10 row(s)

hive> select name,no,ntile(2) over (order by no ASC) as number from part;
Total MapReduce CPU Time Spent: 9 seconds 60 msec
OK
'sai'	1	1
'ven'	2	1
'krish'	3	1
'ram'	4	1
'rajesh'	5	1
'eas'	6	2
'res'	7	2
'resd'	8	2
'dasd'	9	2
'dadsd'	10	2
Time taken: 58.913 seconds, Fetched: 10 row(s)



hive> create external table part1(name string, no int) partitioned by(ntiln int)
    > row format delimited 
    > fields terminated by ',';
OK
Time taken: 0.179 seconds
hive> insert into table  part1 
    > select name,no,ntile(2) over (order by no ASC) as number from part;
hive> hive.exec.dynamic.partition.mode=nonstrict
hive> set hive.exec.dynamic.partition=true;

hive> insert into table  part1 partition(ntiln)
    > select name,no,ntile(2) over (order by no ASC) as number from part;

Query ID = cloudera_20181102092121_4c4808c9-8eec-4223-b32f-ae2e6e9faee4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1541173857731_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1541173857731_0003/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1541173857731_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-11-02 09:21:16,070 Stage-1 map = 0%,  reduce = 0%
2018-11-02 09:21:32,811 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.4 sec
2018-11-02 09:21:42,542 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.46 sec
MapReduce Total cumulative CPU time: 5 seconds 460 msec
Ended Job = job_1541173857731_0003
Loading data to table datakare.part1 partition (ntiln=null)
	 Time taken for load dynamic partitions : 1976
	Loading partition {ntiln=1}
	Loading partition {ntiln=2}
	 Time taken for adding to write entity : 2
Partition datakare.part1{ntiln=1} stats: [numFiles=1, numRows=5, totalSize=45, rawDataSize=40]
Partition datakare.part1{ntiln=2} stats: [numFiles=1, numRows=5, totalSize=45, rawDataSize=40]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.46 sec   HDFS Read: 8963 HDFS Write: 209 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 460 msec
OK
Time taken: 41.754 seconds
hive> show partitions part1;
OK
ntiln=1
ntiln=2
Time taken: 0.16 seconds, Fetched: 2 row(s)
hive>  exit;

[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/datakare.db/part1/ntiln=1/*
'sai',1
'ven',2
'krish',3
'ram',4
'rajesh',5
[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/datakare.db/part1/ntiln=2/*
'eas',6
'res',7
'resd',8
'dasd',9
'dadsd',10
Announcements