Created 04-13-2020 01:13 PM
I'm trying to run an incremental import job on SQL Server:
sqoop job --create incjob \
-- import \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--driver net.sourceforge.jtds.jdbc.Driver \
--username lrm0613 \
-P \
--table JobHistory \
--incremental lastmodified \
--check-column JobId \
--hcatalog-database dataengsandbox \
--hcatalog-table JobHistory \
-m 1 \
--merge-key JobId
When I run the job, I get the following error:
$ sqoop job --exec incjob
Warning: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/04/13 15:58:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
Enter password:
20/04/13 15:58:48 WARN tool.BaseSqoopTool: Input field/record delimiter options are not used in HCatalog jobs unless the format is text. It is better to use --hive-import in those cases. For text formats
20/04/13 15:58:48 WARN tool.BaseSqoopTool: Output field/record delimiter options are not useful in HCatalog jobs for most of the output types except text based formats is text. It is better to use --hive-import in those cases. For non text formats,
20/04/13 15:58:48 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
20/04/13 15:58:48 INFO manager.SqlManager: Using default fetchSize of 1000
20/04/13 15:58:48 INFO tool.CodeGenTool: Beginning code generation
20/04/13 15:58:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0
20/04/13 15:58:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0
20/04/13 15:58:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-lrm0613/compile/f11a4812c94c4f349a579536733c37cb/JobHistory.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/04/13 15:58:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-lrm0613/compile/f11a4812c94c4f349a579536733c37cb/JobHistory.jar
20/04/13 15:58:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0
20/04/13 15:58:53 ERROR manager.SqlManager: SQL exception accessing current timestamp: net.sourceforge.jtds.jdbc.AQSQLException: Id 102, Level 15, State 1, Line 1
Incorrect syntax near ')'.
net.sourceforge.jtds.jdbc.AQSQLException: Id 102, Level 15, State 1, Line 1
Incorrect syntax near ')'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:377)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2335)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:653)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:477)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeQuery(JtdsStatement.java:1304)
at org.apache.sqoop.manager.SqlManager.getCurrentDbTimestamp(SqlManager.java:987)
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:334)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:504)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:299)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
20/04/13 15:58:53 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not get current time from database
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:336)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:504)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:299)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Created 04-14-2020 04:12 PM
From the log I see that sqoop ignored your specified jTDS JDBC driver and fell back to generic JDBC manager that may have conflicting syntax for getting timestamp from the database. To make sure sqoop uses your driver of choice, add the following parameter:
--connection-manager org.apache.sqoop.manager.SQLServerManager
Another thing I noticed in your command is you are using --incremental lastmodified with --check-column JobId. This is asking sqoop to check timestamp in column JobId, which I don't think is your intention. For more on this topic check seciont 7.2.9 in sqoop documentation.
Created on 04-17-2020 07:58 AM - edited 04-17-2020 09:35 AM
Glad things are moving forward for you, Heri.
Examining your sqoop command, I notice the following:
So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically.
If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads.
For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation
If this is helpful, don't forget to give kudos and accept the solution. Thank you!
Created 04-14-2020 04:12 PM
From the log I see that sqoop ignored your specified jTDS JDBC driver and fell back to generic JDBC manager that may have conflicting syntax for getting timestamp from the database. To make sure sqoop uses your driver of choice, add the following parameter:
--connection-manager org.apache.sqoop.manager.SQLServerManager
Another thing I noticed in your command is you are using --incremental lastmodified with --check-column JobId. This is asking sqoop to check timestamp in column JobId, which I don't think is your intention. For more on this topic check seciont 7.2.9 in sqoop documentation.
Created 04-17-2020 05:56 AM
Thanks a lot for your answer aakulov!
I changed the command to the following format:
qoop job \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
-Dsqoop.export.records.per.statement=1 \
--create incjob \
-- import \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--username lrm0613 \
--password-alias sqlserver2.password \
--query 'select * from JobHistory where $CONDITIONS' \
--incremental lastmodified \
--check-column EventTime \
--hcatalog-database dataengsandbox \
--hcatalog-table JobHistory \
-m 1 \
--merge-key EventTime
It works now, however, I found that the query that it internally runs is the following:
select * from JobHistory where [EventTime] < '2020-04-17 08:51:00.54' AND (1 = 0)
8:51 is the time where I ran it, but not the time since the last row was inserted.
Will it then lose the rows inserted between the last time I ran it and now?
Created on 04-17-2020 07:58 AM - edited 04-17-2020 09:35 AM
Glad things are moving forward for you, Heri.
Examining your sqoop command, I notice the following:
So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically.
If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads.
For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation
If this is helpful, don't forget to give kudos and accept the solution. Thank you!