Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Could not get current time from database

avatar
Contributor

I'm trying to run an incremental import job on SQL Server:

 

sqoop job --create incjob \
-- import \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--driver net.sourceforge.jtds.jdbc.Driver \
--username lrm0613 \
-P \
--table JobHistory \
--incremental lastmodified \
--check-column JobId \
--hcatalog-database dataengsandbox \
--hcatalog-table JobHistory \
-m 1 \
--merge-key JobId

 

When I run the job, I get the following error:

 

$ sqoop job --exec incjob
Warning: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/04/13 15:58:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
Enter password:
20/04/13 15:58:48 WARN tool.BaseSqoopTool: Input field/record delimiter options are not used in HCatalog jobs unless the format is text. It is better to use --hive-import in those cases. For text formats
20/04/13 15:58:48 WARN tool.BaseSqoopTool: Output field/record delimiter options are not useful in HCatalog jobs for most of the output types except text based formats is text. It is better to use --hive-import in those cases. For non text formats,
20/04/13 15:58:48 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
20/04/13 15:58:48 INFO manager.SqlManager: Using default fetchSize of 1000
20/04/13 15:58:48 INFO tool.CodeGenTool: Beginning code generation
20/04/13 15:58:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0
20/04/13 15:58:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0
20/04/13 15:58:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-lrm0613/compile/f11a4812c94c4f349a579536733c37cb/JobHistory.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/04/13 15:58:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-lrm0613/compile/f11a4812c94c4f349a579536733c37cb/JobHistory.jar
20/04/13 15:58:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0
20/04/13 15:58:53 ERROR manager.SqlManager: SQL exception accessing current timestamp: net.sourceforge.jtds.jdbc.AQSQLException: Id 102, Level 15, State 1, Line 1
Incorrect syntax near ')'.
net.sourceforge.jtds.jdbc.AQSQLException: Id 102, Level 15, State 1, Line 1
Incorrect syntax near ')'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:377)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2335)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:653)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:477)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeQuery(JtdsStatement.java:1304)
at org.apache.sqoop.manager.SqlManager.getCurrentDbTimestamp(SqlManager.java:987)
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:334)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:504)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:299)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
20/04/13 15:58:53 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not get current time from database
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:336)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:504)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:299)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

2 ACCEPTED SOLUTIONS

avatar
Master Collaborator

From the log I see that sqoop ignored your specified jTDS JDBC driver and fell back to generic JDBC manager that may have conflicting syntax for getting timestamp from the database. To make sure sqoop uses your driver of choice, add the following parameter:

 

--connection-manager org.apache.sqoop.manager.SQLServerManager 

 

Another thing I noticed in your command is you are using --incremental lastmodified with --check-column JobId. This is asking sqoop to check timestamp in column JobId, which I don't think is your intention. For more on this topic check seciont 7.2.9 in sqoop documentation.

View solution in original post

avatar
Master Collaborator

Glad things are moving forward for you, Heri.

 

Examining your sqoop command, I notice the following:

  • --check-column EventTime tells sqoop to check this column as the timestamp column for select logic
  • --incremental lastmodified tells sqoop that your source SQL table can have both records added to it AND records updated in it. Sqoop assumes that when a record is updated or added its EventTime is set to current timestamp.
  • When you run this job for the first time, sqoop will pickup ALL records available (initial load). It will then print out a --last-value timestampX.  This timestamp is the cutoff point for the next run of the job (i.e. next time you run the job with --exec incjob, it will set --last-value timestampX)

So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically.

 

If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads.

 

For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation  

 

If this is helpful, don't forget to give kudos and accept the solution. Thank you!

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

From the log I see that sqoop ignored your specified jTDS JDBC driver and fell back to generic JDBC manager that may have conflicting syntax for getting timestamp from the database. To make sure sqoop uses your driver of choice, add the following parameter:

 

--connection-manager org.apache.sqoop.manager.SQLServerManager 

 

Another thing I noticed in your command is you are using --incremental lastmodified with --check-column JobId. This is asking sqoop to check timestamp in column JobId, which I don't think is your intention. For more on this topic check seciont 7.2.9 in sqoop documentation.

avatar
Contributor

Thanks a lot for your answer aakulov!

 

 I changed the command to the following format:

 

qoop job \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \
-Dsqoop.export.records.per.statement=1 \
--create incjob \
-- import \
--connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \
--username lrm0613 \
--password-alias sqlserver2.password \
--query 'select * from JobHistory where $CONDITIONS' \
--incremental lastmodified \
--check-column EventTime \
--hcatalog-database dataengsandbox \
--hcatalog-table JobHistory \
-m 1 \
--merge-key EventTime

 

It works now, however, I found that the query that it internally runs is the following:

 

select * from JobHistory where [EventTime] < '2020-04-17 08:51:00.54' AND (1 = 0)

 

8:51 is the time where I ran it, but not the time since the last row was inserted.

 

Will it then lose the rows inserted between the last time I ran it and now?

avatar
Master Collaborator

Glad things are moving forward for you, Heri.

 

Examining your sqoop command, I notice the following:

  • --check-column EventTime tells sqoop to check this column as the timestamp column for select logic
  • --incremental lastmodified tells sqoop that your source SQL table can have both records added to it AND records updated in it. Sqoop assumes that when a record is updated or added its EventTime is set to current timestamp.
  • When you run this job for the first time, sqoop will pickup ALL records available (initial load). It will then print out a --last-value timestampX.  This timestamp is the cutoff point for the next run of the job (i.e. next time you run the job with --exec incjob, it will set --last-value timestampX)

So, to answer your question, it looks like sqoop is treating your job as an incremental load on the first run: [EventTime] < '2020-04-17 08:51:00.54'. When this job is kicked off again, it should pickup records from where it left off automatically.

 

If you want, you can provide a manual --last-value timestamp for the initial load, but make sure you don't use it on subsequent incremental loads.

 

For more details, please review sections 7.2.7 and 11.4 of Sqoop Documentation  

 

If this is helpful, don't forget to give kudos and accept the solution. Thank you!