Created 07-07-2016 04:26 PM
Hi, I am looking to run 2 sqoop command
1. To pull the entire content of a table and insert into an Hive partitioned table using sqoop
sqoop import --connect jdbc:oracle:thin:@//jfadboc1.jfa.unibet.com:1521/xxx --username xxx --password xxx --table DW_FACT_PUNTER_TEMP --split-by TIME_KEY --target-dir unica/data/DW_FACT_PUNTER_TEST --hive-import --hive-overwrite --hive-drop-import-delims --null-non-string '\\N' --null-string '\\N' --hive-table unica.DW_FACT_PUNTER_TEST_TEMP --hive-partition-key "TIME_YEAR", "TIME_MONTH","TIME_DAY"
2. Do incremental load in sqoop and insert into the partitioned table
sqoop import --connect jdbc:oracle:thin:@//jfadboc1.jfa.unibet.com:1521/xxx --username
xxx --password xxx --query "SELECT TIME_KEY,PUNTER_KEY,PRODUCT_KEY,INDICATOR_KEY,INDICATOR_VALUE,INSERT_TIME FROM DW_FACT_PUNTER_TEST WHERE \$CONDITIONS AND (TIME_KEY >=20010101)" --split-by TIME_KEY --target-dir unica/data/FACT_PUNTER_IUD_UNICA_INCR --hive-import --hive-overwrite --hive-drop-import-delims --null-non-string '\\N' --null-string '\\N' --hive-table unica.DW_FACT_PUNTER_TEST_TEMP --hive-partition-key "TIME_YEAR"
My table structure is below
CREATE TABLE IF NOT EXISTS DW_FACT_PUNTER_TEST_TEMP (PUNTER_KEY DECIMAL(36,2), PRODUCT_KEY DECIMAL(36,2), INDICATOR_KEY DECIMAL(36,2), INDICATOR_VALUE DECIMAL(36,2), INSERT_TIME STRING) PARTITIONED BY(TIME_YEAR STRING, TIME_MONTH STRING,TIME_DAY STRING);
Both these sqoop commands are not working.
Can you please help me understand how I can use --hive-partition-key for mass import and where there are more than 1 columns used to partition a table.
Created 07-07-2016 08:44 PM
@Parinita Kothari To import into Multi-key partitioned Hive table , you can make use of --hcatalog-table
-table <tblname> --hcatalog-table <tblname> --hcatalog-partition-keys k1,k2 --hcatalog-partition-values 'v1,v2'
eg: -table DW_FACT_PUNTER_TEST_TEMP --hcatalog-table DW_FACT_PUNTER_TEST_TEMP --hcatalog-partition-keys TIME_YEAR,TIME_MONTH --hcatalog-partition-values '2014-04-11,2014-APR'
Created 07-07-2016 08:44 PM
@Parinita Kothari To import into Multi-key partitioned Hive table , you can make use of --hcatalog-table
-table <tblname> --hcatalog-table <tblname> --hcatalog-partition-keys k1,k2 --hcatalog-partition-values 'v1,v2'
eg: -table DW_FACT_PUNTER_TEST_TEMP --hcatalog-table DW_FACT_PUNTER_TEST_TEMP --hcatalog-partition-keys TIME_YEAR,TIME_MONTH --hcatalog-partition-values '2014-04-11,2014-APR'
Created 07-08-2016 09:20 AM
Thank you @Dileep Kumar Chiguruvada for your quick response.
Is it neccessary that I need to hardcode the value for --hcatalog-partition-values?
The table I am pulling data from (in oracle) has data since 2001-01-01 and it could be nice to be able to use dynamic partitions here.
Created 07-11-2016 06:55 AM
@Parinita Kothari Yes you can do that too..
Create a Hive paritioned table and set hive.exec.dynamic.partition.mode=nonstrict in a case of dynamic |
-Import table from DB to the partitioned table |
here you need not specificy partition-keys and partition-values , Just -table <tblname> --hcatalog-table <tblname> is enough..
Created 07-13-2016 10:25 AM
Thank you @Dileep Kumar Chiguruvada it works now. I had to use sqoop and import the contents into a temp table ( which wasn't partitioned) and after use this temp table to insert into the actual partitioned tables. Couldn't really find a direct way to ingest data directly into a partitioned table which has more than 1 columns which are partitioned using sqoop.