Created 01-23-2016 07:14 AM
sqoop incremental import
i got this below error ,how can i solve that i am using hortonworks 2.3
Append mode for hive imports is not yet supported. Please remove the parameter --append-mode
Created 01-25-2016 02:17 PM
Hi @sivasaravanakumar k, yes you are write, sqoop indeed says that "Append mode for hive imports is not yet supported". However, it can be done by incremental import to HDFS and mapping your Hive table to sqoop's target-dir. A full example is attached, here are the highlights:
CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id 16/01/25 13:36:07 INFO tool.ImportTool: Upper bound value: 5000 16/01/25 13:36:27 INFO mapreduce.ImportJobBase: Retrieved 5000 records.
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id --last-value 5000 16/01/25 13:38:23 INFO tool.ImportTool: Lower bound value: 5000 16/01/25 13:38:23 INFO tool.ImportTool: Upper bound value: 5999 16/01/25 13:38:47 INFO mapreduce.ImportJobBase: Retrieved 900 records.
hive> select count(*) from h2; 5900
Created 01-23-2016 11:36 AM
You have this ..Doc
sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "id > 100000" --target-dir /incremental_dataset --append
Created 01-25-2016 04:41 AM
i got this worning message
Append mode for hive imports is not yet supported. Please remove the parameter --append-mode
Created 01-25-2016 02:17 PM
Hi @sivasaravanakumar k, yes you are write, sqoop indeed says that "Append mode for hive imports is not yet supported". However, it can be done by incremental import to HDFS and mapping your Hive table to sqoop's target-dir. A full example is attached, here are the highlights:
CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id 16/01/25 13:36:07 INFO tool.ImportTool: Upper bound value: 5000 16/01/25 13:36:27 INFO mapreduce.ImportJobBase: Retrieved 5000 records.
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id --last-value 5000 16/01/25 13:38:23 INFO tool.ImportTool: Lower bound value: 5000 16/01/25 13:38:23 INFO tool.ImportTool: Upper bound value: 5999 16/01/25 13:38:47 INFO mapreduce.ImportJobBase: Retrieved 900 records.
hive> select count(*) from h2; 5900
Created 02-03-2016 01:35 AM
@sivasaravanakumar k are you still having issues with this? Can you accept best answer or provide your workaround?
Created 02-18-2016 01:31 PM
@sivasaravanakumar k Another way of doing incremental import can be is to do each incremental as a separate partition of Hive table.
First create an external partitioned table
CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) PARTITIONED BY (pt=string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';
Then sqoop data to external partitions by specifying HDFS external location.
sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id < 5000" --target-dir /user/it1/sqin5/pt=0 -m 1
Add partition to Hive table
alter table h2 add PARTITION(pt=0) LOCATION '/user/it1/sqin5/pt=0';
Verify table count. It should return 5000 rows. Now, run an incremental sqoop by specifying appropriate where clause
sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id > 5000 and id < 10000" --target-dir /user/it1/sqin5/pt=1 -m 1
Add 2nd partition to Hive table.
alter table h2 add PARTITION(pt=1) LOCATION '/user/it1/sqin5/pt=1';
Verify table count. It should return 10,000 rows.