Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop import to partitioned hive table

Sqoop import to partitioned hive table

New Contributor

Hi All, I want to create a simple hive partitioned table and have a sqoop import command to populate it.

1.Table have say 4 columns, ID, col1, col2, col3.

2. One of the column say col2 is int type and contains values 1 to 10 only.

3. I need to partition table based on col2 column with 1 to 5 value data should be in one partition and rest in another.

4. Once done i need to populate this table with sqoop import from my sql server.  

 

I have tried many ways but not able to do it. Can anyone please help?

2 REPLIES 2

Re: Sqoop import to partitioned hive table

Champion

@SqlEnthu

 

Could you refer the below 

CREATE TABLE temps_orc_partition_date
(col1 string , col2 int )
PARTITIONED BY (col2=1 )
STORED AS ORC;



for sqoop import you can use the below param 

hive-partition-key col2\

--hive-partition-value 1 
Highlighted

Re: Sqoop import to partitioned hive table

Contributor

hello, @SqlEnthu

Below i am providing the solution, might be it would help you.

'sai',1
'ven',2
'krish',3
'ram',4
'rajesh',5
'eas',6
'res',7
'resd',8
'dasd',9
'dadsd',10

hive> use datakare;
OK
Time taken: 0.07 seconds
hive> create table part(name string,no int)
    > row format delimited 
    > fields terminated by ',';
OK
Time taken: 4.803 seconds
hive> load data local inpath 'Desktop/ntile' into table part;

hive> select name,no,ntile(2) over (order by no ASC) from part;

OK
'sai'	1	1
'ven'	2	1
'krish'	3	1
'ram'	4	1
'rajesh'	5	1
'eas'	6	2
'res'	7	2
'resd'	8	2
'dasd'	9	2
'dadsd'	10	2
Time taken: 139.398 seconds, Fetched: 10 row(s)

hive> select name,no,ntile(2) over (order by no ASC) as number from part;
Total MapReduce CPU Time Spent: 9 seconds 60 msec
OK
'sai'	1	1
'ven'	2	1
'krish'	3	1
'ram'	4	1
'rajesh'	5	1
'eas'	6	2
'res'	7	2
'resd'	8	2
'dasd'	9	2
'dadsd'	10	2
Time taken: 58.913 seconds, Fetched: 10 row(s)



hive> create external table part1(name string, no int) partitioned by(ntiln int)
    > row format delimited 
    > fields terminated by ',';
OK
Time taken: 0.179 seconds
hive> insert into table  part1 
    > select name,no,ntile(2) over (order by no ASC) as number from part;
hive> hive.exec.dynamic.partition.mode=nonstrict
hive> set hive.exec.dynamic.partition=true;

hive> insert into table  part1 partition(ntiln)
    > select name,no,ntile(2) over (order by no ASC) as number from part;

Query ID = cloudera_20181102092121_4c4808c9-8eec-4223-b32f-ae2e6e9faee4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1541173857731_0003, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1541173857731_0003/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1541173857731_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-11-02 09:21:16,070 Stage-1 map = 0%,  reduce = 0%
2018-11-02 09:21:32,811 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.4 sec
2018-11-02 09:21:42,542 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.46 sec
MapReduce Total cumulative CPU time: 5 seconds 460 msec
Ended Job = job_1541173857731_0003
Loading data to table datakare.part1 partition (ntiln=null)
	 Time taken for load dynamic partitions : 1976
	Loading partition {ntiln=1}
	Loading partition {ntiln=2}
	 Time taken for adding to write entity : 2
Partition datakare.part1{ntiln=1} stats: [numFiles=1, numRows=5, totalSize=45, rawDataSize=40]
Partition datakare.part1{ntiln=2} stats: [numFiles=1, numRows=5, totalSize=45, rawDataSize=40]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.46 sec   HDFS Read: 8963 HDFS Write: 209 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 460 msec
OK
Time taken: 41.754 seconds
hive> show partitions part1;
OK
ntiln=1
ntiln=2
Time taken: 0.16 seconds, Fetched: 2 row(s)
hive>  exit;

[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/datakare.db/part1/ntiln=1/*
'sai',1
'ven',2
'krish',3
'ram',4
'rajesh',5
[cloudera@quickstart ~]$ hadoop fs -cat /user/hive/warehouse/datakare.db/part1/ntiln=2/*
'eas',6
'res',7
'resd',8
'dasd',9
'dadsd',10