Creating a partitioned table in hive with sqoop

Heri — Wed, 06 May 2020 21:15:50 GMT

Hi guys,

I want the Hive table created by sqoop to have two partitions:

One for 2019 and another for 2020.

But for some reason it is putting "2019,2020" in the year column.

Command:

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select jobhistoryid, requeueid, jobid,name,event,eventtime,submittime,starttime,service,year(EventTime) as year from JobHistory where year(EventTime)=2019 or year(EventTime)=2020 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table JobHistoryPartitioned \
--hive-partition-key year \
--hive-partition-value 2019,2020 \
--compress \
--compression-codec snappy \
--map-column-hive jobid=int \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

Re: Creating a partitioned table in hive with sqoop

aakulov — Fri, 22 May 2020 19:51:00 GMT

Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands:

sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020)
sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019)

This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.

question Creating a partitioned table in hive with sqoop in Support Questions

Creating a partitioned table in hive with sqoop

Re: Creating a partitioned table in hive with sqoop