Created 10-31-2017 03:34 PM
We switched sqoop to use a MySQL backed metastore.
We then created a job as follows:
sqoop job --create myjob -- import --append --connect "jdbc:sqlserver://<server>;database=<database>" --username <databaseUser> --password <databasePassword> --table <table> --target-dir "<someDataDirectory>" --hive-import --hive-database <hiveDatabase> --incremental append --check-column <column> --last-value -9999999 --autoreset-to-one-mapper --hive-delims-replacement " " --outdir <outputdir> --bindir <bindir> -- -- --schema <databaseSchema><br>
The job is created successfully in the metastore. I queried MySQL and can see that it was created with the properties specified.
However, when I run the job, the incremental.last.value is not updated.
What can I do to start debugging this issue?
** UPDATE **
I dug a little deeper. I've enabled verbose logging and removed all --hive-import arguments.
I found the error that occurs when sqoop tries to update the metastore. Below is a dump of the log, starting when sqoop connects to the application servers.
The error is at the bottom:
17/11/11 07:46:18 INFO client.RMProxy: Connecting to ResourceManager at <location> 17/11/11 07:46:19 INFO client.AHSProxy: Connecting to Application History server at <location> 17/11/11 07:46:54 DEBUG db.DBConfiguration: Fetching password from job credentials store 17/11/11 07:46:54 INFO db.DBInputFormat: Using read commited transaction isolation 17/11/11 07:46:54 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([Id]), MAX([Id]) FROM [schema].[table] WHERE ( [Id] > -9999999 AND [Id] <= 500512 ) 17/11/11 07:46:54 INFO db.IntegerSplitter: Split size: 125127; Num splits: 4 from: 1 to: 500512 17/11/11 07:46:54 DEBUG db.IntegerSplitter: Splits: [ 1 to 500,512] into 4 parts 17/11/11 07:46:54 DEBUG db.IntegerSplitter: 1 17/11/11 07:46:54 DEBUG db.IntegerSplitter: 125,129 17/11/11 07:46:54 DEBUG db.IntegerSplitter: 250,257 17/11/11 07:46:54 DEBUG db.IntegerSplitter: 375,385 17/11/11 07:46:54 DEBUG db.IntegerSplitter: 500,512 17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 1' and upper bound '[Id] < 125129' 17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 125129' and upper bound '[Id] < 250257' 17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 250257' and upper bound '[Id] < 375385' 17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 375385' and upper bound '[Id] <= 500512' 17/11/11 07:46:55 INFO mapreduce.JobSubmitter: number of splits:4 17/11/11 07:46:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1509726871681_3374 17/11/11 07:46:58 INFO impl.YarnClientImpl: Submitted application application_1509726871681_3374 17/11/11 07:46:58 INFO mapreduce.Job: The url to track the job: <url> 17/11/11 07:46:58 INFO mapreduce.Job: Running job: job_1509726871681_3374 17/11/11 07:47:05 INFO mapreduce.Job: Job job_1509726871681_3374 running in uber mode : false 17/11/11 07:47:05 INFO mapreduce.Job: map 0% reduce 0% 17/11/11 07:47:13 INFO mapreduce.Job: map 25% reduce 0% 17/11/11 07:47:14 INFO mapreduce.Job: map 75% reduce 0% 17/11/11 07:47:15 INFO mapreduce.Job: map 100% reduce 0% 17/11/11 07:47:15 INFO mapreduce.Job: Job job_1509726871681_3374 completed successfully 17/11/11 07:47:15 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=674124 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=476 HDFS: Number of bytes written=175543 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=44554 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=22277 Total vcore-milliseconds taken by all map tasks=22277 Total megabyte-milliseconds taken by all map tasks=34217472 Map-Reduce Framework Map input records=863 Map output records=863 Input split bytes=476 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=367 CPU time spent (ms)=15950 Physical memory (bytes) snapshot=1008590848 Virtual memory (bytes) snapshot=13268086784 Total committed heap usage (bytes)=847773696 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=175543 17/11/11 07:47:15 INFO mapreduce.ImportJobBase: Transferred 171.4287 KB in 57.0946 seconds (3.0025 KB/sec) 17/11/11 07:47:15 INFO mapreduce.ImportJobBase: Retrieved 863 records. 17/11/11 07:47:15 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@9629756 17/11/11 07:47:15 INFO util.AppendUtils: Appending to directory table 17/11/11 07:47:15 INFO util.AppendUtils: Using found partition 4 17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: _SUCCESS ignored 17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00000 repartitioned to: part-m-00004 17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00001 repartitioned to: part-m-00005 17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00002 repartitioned to: part-m-00006 17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00003 repartitioned to: part-m-00007 17/11/11 07:47:16 DEBUG util.AppendUtils: Deleting temporary folder 99599ad0da924af5979d77bb3e4fdfa1_table 17/11/11 07:47:16 INFO tool.ImportTool: Saving incremental import state to the metastore 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_ROOT 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_ROOT 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.storage.version for version null 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: => 0 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.info.table for version 0 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: => SQOOP_SESSIONS 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_SESSIONS 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_SESSIONS 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Creating job: myjob 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Job: myjob; Setting property sqoop.tool with class schema => import 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Job: myjob; Getting property sqoop.tool with class schema 17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: => import 17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Closing connection 17/11/11 07:48:07 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Error communicating with database at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:426) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.update(HsqldbJobStorage.java:445) at org.apache.sqoop.tool.ImportTool.saveIncrementalState(ImportTool.java:164) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:528) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:243) at org.apache.sqoop.tool.JobTool.run(JobTool.java:298) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.main(Sqoop.java:243) Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4187) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2159) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2462) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2379) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2363) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.setV0Property(HsqldbJobStorage.java:707) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:391) ... 12 more 17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction 17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Closing connection
In short, there seems to be a lock on the transaction when sqoop is trying to update the mysql metastore: Lock wait timeout exceeded; try restarting transaction
It is locking on this statement:
UPDATE SQOOP_SESSIONS SET propval = 'import' WHERE job_name = 'myjob' |
This doesn't seem like a valid statement, as there are 62 entries that are returned with that where clause.
Any idea what could be causing this, or where to start debugging further?
Created 11-11-2017 04:22 PM
Finally figured this out with the help of this StackOverflow post! The default engine used when creating a table in MySQL is INNODB, which introduces transactions, thus the error. I simply recreated the SQOOP_SESSIONS table using MyISAM instead of INNODB.
Created 10-31-2017 06:40 PM
@Andrew Duncan Can you try running the sqoop job by placing all options related to incremental append before --connect option. Something like this -
sqoop job --create myjob -- import --append ---check-column <column> --incremental append --last-value -9999999 --connect "jdbc:sqlserver://<server>;database=<database>" -username <databaseUser> --password <databasePassword> --table <table> --target-dir "<someDataDirectory>" --hive-import --hive-database <hiveDatabase>--autoreset-to-one-mapper --hive-delims-replacement " " --outdir <outputdir> --bindir <bindir> -- -- --schema <databaseSchema>
Do let me know if it works.
Created 11-03-2017 12:55 PM
This didn't solve the issue unfortunately.
Created 11-11-2017 04:22 PM
Finally figured this out with the help of this StackOverflow post! The default engine used when creating a table in MySQL is INNODB, which introduces transactions, thus the error. I simply recreated the SQOOP_SESSIONS table using MyISAM instead of INNODB.