I want the Hive table created by sqoop to have two partitions:
One for 2019 and another for 2020.
But for some reason it is putting "2019,2020" in the year column.
sqoop import \-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \--connection-manager org.apache.sqoop.manager.SQLServerManager \--driver net.sourceforge.jtds.jdbc.Driver \--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \--username 'lrm0613' \--password-alias sqlserver2.password \--query 'select jobhistoryid, requeueid, jobid,name,event,eventtime,submittime,starttime,service,year(EventTime) as year from JobHistory where year(EventTime)=2019 or year(EventTime)=2020 and $CONDITIONS' \--hcatalog-database dataengsandbox \--hcatalog-table JobHistoryPartitioned \--hive-partition-key year \--hive-partition-value 2019,2020 \--compress \--compression-codec snappy \--map-column-hive jobid=int \-m 1 \--create-hcatalog-table \--hcatalog-storage-stanza 'stored as parquet'
Sqoop can only insert into a single Hive partition at one time. To accomplish what you are trying to do, you can have two separate sqoop commands:
This way sqoop will write into the one partition you want. Since this is one-time import, the solution should work just fine. Let me know if this works and accept the answer if it makes sense.
View solution in original post