Support Questions

Find answers, ask questions, and share your expertise

sqoop insert into partitioned table​

avatar
Contributor

Hi, I am looking to run 2 sqoop command

1. To pull the entire content of a table and insert into an Hive partitioned table using sqoop

sqoop import --connect jdbc:oracle:thin:@//jfadboc1.jfa.unibet.com:1521/xxx --username xxx --password xxx --table DW_FACT_PUNTER_TEMP --split-by TIME_KEY --target-dir unica/data/DW_FACT_PUNTER_TEST  --hive-import --hive-overwrite --hive-drop-import-delims --null-non-string '\\N' --null-string '\\N' --hive-table unica.DW_FACT_PUNTER_TEST_TEMP --hive-partition-key "TIME_YEAR", "TIME_MONTH","TIME_DAY"  

2. Do incremental load in sqoop and insert into the partitioned table

sqoop import --connect jdbc:oracle:thin:@//jfadboc1.jfa.unibet.com:1521/xxx --username 

xxx --password xxx --query "SELECT TIME_KEY,PUNTER_KEY,PRODUCT_KEY,INDICATOR_KEY,INDICATOR_VALUE,INSERT_TIME FROM DW_FACT_PUNTER_TEST WHERE \$CONDITIONS AND (TIME_KEY >=20010101)" --split-by TIME_KEY --target-dir unica/data/FACT_PUNTER_IUD_UNICA_INCR --hive-import --hive-overwrite --hive-drop-import-delims --null-non-string '\\N' --null-string '\\N' --hive-table unica.DW_FACT_PUNTER_TEST_TEMP --hive-partition-key "TIME_YEAR"

My table structure is below

CREATE TABLE IF NOT EXISTS DW_FACT_PUNTER_TEST_TEMP
(PUNTER_KEY DECIMAL(36,2), 
PRODUCT_KEY DECIMAL(36,2), 
INDICATOR_KEY DECIMAL(36,2), 
INDICATOR_VALUE DECIMAL(36,2), 
INSERT_TIME STRING) 
PARTITIONED BY(TIME_YEAR STRING, TIME_MONTH STRING,TIME_DAY STRING);

Both these sqoop commands are not working.

Can you please help me understand how I can use --hive-partition-key for mass import and where there are more than 1 columns used to partition a table.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Parinita Kothari To import into Multi-key partitioned Hive table , you can make use of --hcatalog-table

-table <tblname> --hcatalog-table <tblname> --hcatalog-partition-keys k1,k2 --hcatalog-partition-values 'v1,v2'

eg: -table DW_FACT_PUNTER_TEST_TEMP --hcatalog-table DW_FACT_PUNTER_TEST_TEMP --hcatalog-partition-keys TIME_YEAR,TIME_MONTH --hcatalog-partition-values '2014-04-11,2014-APR'

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

@Parinita Kothari To import into Multi-key partitioned Hive table , you can make use of --hcatalog-table

-table <tblname> --hcatalog-table <tblname> --hcatalog-partition-keys k1,k2 --hcatalog-partition-values 'v1,v2'

eg: -table DW_FACT_PUNTER_TEST_TEMP --hcatalog-table DW_FACT_PUNTER_TEST_TEMP --hcatalog-partition-keys TIME_YEAR,TIME_MONTH --hcatalog-partition-values '2014-04-11,2014-APR'

avatar
Contributor

Thank you @Dileep Kumar Chiguruvada for your quick response.

Is it neccessary that I need to hardcode the value for --hcatalog-partition-values?

The table I am pulling data from (in oracle) has data since 2001-01-01 and it could be nice to be able to use dynamic partitions here.

avatar
Expert Contributor

@Parinita Kothari Yes you can do that too..

Create a Hive paritioned table and set hive.exec.dynamic.partition.mode=nonstrict in a case of dynamic
-Import table from DB to the partitioned table

here you need not specificy partition-keys and partition-values , Just -table <tblname> --hcatalog-table <tblname> is enough..

avatar
Contributor

Thank you @Dileep Kumar Chiguruvada it works now. I had to use sqoop and import the contents into a temp table ( which wasn't partitioned) and after use this temp table to insert into the actual partitioned tables. Couldn't really find a direct way to ingest data directly into a partitioned table which has more than 1 columns which are partitioned using sqoop.