Support Questions

Find answers, ask questions, and share your expertise

sqoop incremental import in hive i get error message hive not support append mode how to solve that

avatar
Expert Contributor

sqoop incremental import

i got this below error ,how can i solve that i am using hortonworks 2.3

Append mode for hive imports is not yet supported. Please remove the parameter --append-mode

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @sivasaravanakumar k, yes you are write, sqoop indeed says that "Append mode for hive imports is not yet supported". However, it can be done by incremental import to HDFS and mapping your Hive table to sqoop's target-dir. A full example is attached, here are the highlights:

  • Define your Hive table as external table
CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';
  • Initially it's empty. Do the first sqoop, importing 5000 entries from MySql table st1 (having id's set to 1-5000), setting target-dir to the location of our external table /user/it1/sqin5
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id
16/01/25 13:36:07 INFO tool.ImportTool: Upper bound value: 5000
16/01/25 13:36:27 INFO mapreduce.ImportJobBase: Retrieved 5000 records.
  • If you check now in Hive, table h2 has 5000 entries. Now append 900 entries to MySql table st1, 5100<=id<6000 and do incremental append import setting last-value to 5000.
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id --last-value 5000
16/01/25 13:38:23 INFO tool.ImportTool: Lower bound value: 5000
16/01/25 13:38:23 INFO tool.ImportTool: Upper bound value: 5999
16/01/25 13:38:47 INFO mapreduce.ImportJobBase: Retrieved 900 records.
  • If you check now Hive table h2 has 5900 entries.
hive> select count(*) from h2;
5900
  • In the same way you can also handle Sqoop incremental imports into Hive based on "lastmodified" and "merge-key".
  • You can also create a Sqoop job like in your other question and let Sqoop take care of last-value.
  • You can import into Hive local (non-external) tables by setting Sqoop target-dir to /apps/hive/warehouse/<table-name>. [That's what Sqoop does when using "--hive-import"]

View solution in original post

5 REPLIES 5

avatar
Master Mentor
@sivasaravanakumar k

You have this ..Doc

  • sqoop import --connect jdbc:mysql://db.foo.com/bar --table EMPLOYEES --where "id > 100000" --target-dir /incremental_dataset --append

avatar
Expert Contributor

i got this worning message

Append mode for hive imports is not yet supported. Please remove the parameter --append-mode

avatar
Master Guru

Hi @sivasaravanakumar k, yes you are write, sqoop indeed says that "Append mode for hive imports is not yet supported". However, it can be done by incremental import to HDFS and mapping your Hive table to sqoop's target-dir. A full example is attached, here are the highlights:

  • Define your Hive table as external table
CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';
  • Initially it's empty. Do the first sqoop, importing 5000 entries from MySql table st1 (having id's set to 1-5000), setting target-dir to the location of our external table /user/it1/sqin5
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id
16/01/25 13:36:07 INFO tool.ImportTool: Upper bound value: 5000
16/01/25 13:36:27 INFO mapreduce.ImportJobBase: Retrieved 5000 records.
  • If you check now in Hive, table h2 has 5000 entries. Now append 900 entries to MySql table st1, 5100<=id<6000 and do incremental append import setting last-value to 5000.
sqoop import --connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --target-dir /user/it1/sqin5 -m 1 --incremental append -check-column id --last-value 5000
16/01/25 13:38:23 INFO tool.ImportTool: Lower bound value: 5000
16/01/25 13:38:23 INFO tool.ImportTool: Upper bound value: 5999
16/01/25 13:38:47 INFO mapreduce.ImportJobBase: Retrieved 900 records.
  • If you check now Hive table h2 has 5900 entries.
hive> select count(*) from h2;
5900
  • In the same way you can also handle Sqoop incremental imports into Hive based on "lastmodified" and "merge-key".
  • You can also create a Sqoop job like in your other question and let Sqoop take care of last-value.
  • You can import into Hive local (non-external) tables by setting Sqoop target-dir to /apps/hive/warehouse/<table-name>. [That's what Sqoop does when using "--hive-import"]

avatar
Master Mentor

@sivasaravanakumar k are you still having issues with this? Can you accept best answer or provide your workaround?

avatar

@sivasaravanakumar k Another way of doing incremental import can be is to do each incremental as a separate partition of Hive table.

First create an external partitioned table

CREATE EXTERNAL TABLE h2 (id int, name STRING, ts TIMESTAMP) PARTITIONED BY (pt=string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/it1/sqin5';

Then sqoop data to external partitions by specifying HDFS external location.

sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id < 5000" --target-dir /user/it1/sqin5/pt=0 -m 1 

Add partition to Hive table

alter table h2 add PARTITION(pt=0) LOCATION '/user/it1/sqin5/pt=0';

Verify table count. It should return 5000 rows. Now, run an incremental sqoop by specifying appropriate where clause

sqoop import--connect jdbc:mysql://localhost:3306/test --driver com.mysql.jdbc.Driver --username it1 --password hadoop --table st1 --where "where id > 5000 and id < 10000" --target-dir /user/it1/sqin5/pt=1 -m 1 

Add 2nd partition to Hive table.

alter table h2 add PARTITION(pt=1) LOCATION '/user/it1/sqin5/pt=1';

Verify table count. It should return 10,000 rows.