Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Creating a partitioned table in hive with sqoop

avatar
Contributor

Hi guys,

 

 I want the Hive table created by sqoop to have two partitions:

 

One for 2019 and another for 2020.

 

But for some reason it is putting "2019,2020" in the year column.

 

Command:

 

sqoop import \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--username 'lrm0613' \
--password-alias sqlserver2.password \
--query 'select jobhistoryid, requeueid, jobid,name,event,eventtime,submittime,starttime,service,year(EventTime) as year from JobHistory where year(EventTime)=2019 or year(EventTime)=2020 and $CONDITIONS' \
--hcatalog-database dataengsandbox \
--hcatalog-table JobHistoryPartitioned \
--hive-partition-key year \
--hive-partition-value 2019,2020 \
--compress \
--compression-codec snappy \
--map-column-hive jobid=int \
-m 1 \
--create-hcatalog-table \
--hcatalog-storage-stanza 'stored as parquet'

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands:

  1. sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020)
  2. sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019)

This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands:

  1. sqoop with --query ... where year(EventTime)=2019 (remove year(EventTime)=2020) and set --hive-partition-value 2019 (not 2020)
  2. sqoop with --query ... where year(EventTime)=2020 (remove year(EventTime)=2019) and set --hive-partition-value 2020 (not 2019)

This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.