Created 10-31-2017 03:34 PM
We switched sqoop to use a MySQL backed metastore.
We then created a job as follows:
sqoop job --create myjob -- import --append --connect "jdbc:sqlserver://<server>;database=<database>" --username <databaseUser> --password <databasePassword> --table <table> --target-dir "<someDataDirectory>" --hive-import --hive-database <hiveDatabase> --incremental append --check-column <column> --last-value -9999999 --autoreset-to-one-mapper --hive-delims-replacement " " --outdir <outputdir> --bindir <bindir> -- -- --schema <databaseSchema><br>
The job is created successfully in the metastore. I queried MySQL and can see that it was created with the properties specified.
However, when I run the job, the incremental.last.value is not updated.
What can I do to start debugging this issue?
** UPDATE **
I dug a little deeper. I've enabled verbose logging and removed all --hive-import arguments.
I found the error that occurs when sqoop tries to update the metastore. Below is a dump of the log, starting when sqoop connects to the application servers.
The error is at the bottom:
17/11/11 07:46:18 INFO client.RMProxy: Connecting to ResourceManager at <location>
17/11/11 07:46:19 INFO client.AHSProxy: Connecting to Application History server at <location>
17/11/11 07:46:54 DEBUG db.DBConfiguration: Fetching password from job credentials store
17/11/11 07:46:54 INFO db.DBInputFormat: Using read commited transaction isolation
17/11/11 07:46:54 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([Id]), MAX([Id]) FROM [schema].[table] WHERE ( [Id] > -9999999 AND [Id] <= 500512 )
17/11/11 07:46:54 INFO db.IntegerSplitter: Split size: 125127; Num splits: 4 from: 1 to: 500512
17/11/11 07:46:54 DEBUG db.IntegerSplitter: Splits: [ 1 to 500,512] into 4 parts
17/11/11 07:46:54 DEBUG db.IntegerSplitter: 1
17/11/11 07:46:54 DEBUG db.IntegerSplitter: 125,129
17/11/11 07:46:54 DEBUG db.IntegerSplitter: 250,257
17/11/11 07:46:54 DEBUG db.IntegerSplitter: 375,385
17/11/11 07:46:54 DEBUG db.IntegerSplitter: 500,512
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 1' and upper bound '[Id] < 125129'
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 125129' and upper bound '[Id] < 250257'
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 250257' and upper bound '[Id] < 375385'
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 375385' and upper bound '[Id] <= 500512'
17/11/11 07:46:55 INFO mapreduce.JobSubmitter: number of splits:4
17/11/11 07:46:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1509726871681_3374
17/11/11 07:46:58 INFO impl.YarnClientImpl: Submitted application application_1509726871681_3374
17/11/11 07:46:58 INFO mapreduce.Job: The url to track the job: <url>
17/11/11 07:46:58 INFO mapreduce.Job: Running job: job_1509726871681_3374
17/11/11 07:47:05 INFO mapreduce.Job: Job job_1509726871681_3374 running in uber mode : false
17/11/11 07:47:05 INFO mapreduce.Job: map 0% reduce 0%
17/11/11 07:47:13 INFO mapreduce.Job: map 25% reduce 0%
17/11/11 07:47:14 INFO mapreduce.Job: map 75% reduce 0%
17/11/11 07:47:15 INFO mapreduce.Job: map 100% reduce 0%
17/11/11 07:47:15 INFO mapreduce.Job: Job job_1509726871681_3374 completed successfully
17/11/11 07:47:15 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=674124
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=476
HDFS: Number of bytes written=175543
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=44554
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=22277
Total vcore-milliseconds taken by all map tasks=22277
Total megabyte-milliseconds taken by all map tasks=34217472
Map-Reduce Framework
Map input records=863
Map output records=863
Input split bytes=476
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=367
CPU time spent (ms)=15950
Physical memory (bytes) snapshot=1008590848
Virtual memory (bytes) snapshot=13268086784
Total committed heap usage (bytes)=847773696
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=175543
17/11/11 07:47:15 INFO mapreduce.ImportJobBase: Transferred 171.4287 KB in 57.0946 seconds (3.0025 KB/sec)
17/11/11 07:47:15 INFO mapreduce.ImportJobBase: Retrieved 863 records.
17/11/11 07:47:15 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@9629756
17/11/11 07:47:15 INFO util.AppendUtils: Appending to directory table
17/11/11 07:47:15 INFO util.AppendUtils: Using found partition 4
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: _SUCCESS ignored
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00000 repartitioned to: part-m-00004
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00001 repartitioned to: part-m-00005
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00002 repartitioned to: part-m-00006
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00003 repartitioned to: part-m-00007
17/11/11 07:47:16 DEBUG util.AppendUtils: Deleting temporary folder 99599ad0da924af5979d77bb3e4fdfa1_table
17/11/11 07:47:16 INFO tool.ImportTool: Saving incremental import state to the metastore
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_ROOT
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_ROOT
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.storage.version for version null
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: => 0
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.info.table for version 0
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: => SQOOP_SESSIONS
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_SESSIONS
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_SESSIONS
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Creating job: myjob
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Job: myjob; Setting property sqoop.tool with class schema => import
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Job: myjob; Getting property sqoop.tool with class schema
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: => import
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Closing connection
17/11/11 07:48:07 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Error communicating with database
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:426)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.update(HsqldbJobStorage.java:445)
at org.apache.sqoop.tool.ImportTool.saveIncrementalState(ImportTool.java:164)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:528)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:243)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:298)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4187)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2159)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2462)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2379)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2363)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.setV0Property(HsqldbJobStorage.java:707)
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:391)
... 12 more
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Closing connectionIn short, there seems to be a lock on the transaction when sqoop is trying to update the mysql metastore: Lock wait timeout exceeded; try restarting transaction
It is locking on this statement:
| UPDATE SQOOP_SESSIONS SET propval = 'import' WHERE job_name = 'myjob' |
This doesn't seem like a valid statement, as there are 62 entries that are returned with that where clause.
Any idea what could be causing this, or where to start debugging further?
Created 11-11-2017 04:22 PM
Finally figured this out with the help of this StackOverflow post! The default engine used when creating a table in MySQL is INNODB, which introduces transactions, thus the error. I simply recreated the SQOOP_SESSIONS table using MyISAM instead of INNODB.
Created 10-31-2017 06:40 PM
@Andrew Duncan Can you try running the sqoop job by placing all options related to incremental append before --connect option. Something like this -
sqoop job --create myjob -- import --append ---check-column <column> --incremental append --last-value -9999999 --connect "jdbc:sqlserver://<server>;database=<database>" -username <databaseUser> --password <databasePassword> --table <table> --target-dir "<someDataDirectory>" --hive-import --hive-database <hiveDatabase>--autoreset-to-one-mapper --hive-delims-replacement " " --outdir <outputdir> --bindir <bindir> -- -- --schema <databaseSchema>
Do let me know if it works.
Created 11-03-2017 12:55 PM
This didn't solve the issue unfortunately.
Created 11-11-2017 04:22 PM
Finally figured this out with the help of this StackOverflow post! The default engine used when creating a table in MySQL is INNODB, which introduces transactions, thus the error. I simply recreated the SQOOP_SESSIONS table using MyISAM instead of INNODB.