Support Questions

Find answers, ask questions, and share your expertise

Sqoop doesn't update incremental.last.value when using MySQL metastore

avatar

We switched sqoop to use a MySQL backed metastore.

We then created a job as follows:

sqoop job --create myjob -- import --append --connect "jdbc:sqlserver://<server>;database=<database>" --username <databaseUser> --password <databasePassword> --table <table> --target-dir "<someDataDirectory>" --hive-import --hive-database <hiveDatabase> --incremental append --check-column <column> --last-value -9999999 --autoreset-to-one-mapper --hive-delims-replacement " " --outdir <outputdir> --bindir <bindir> -- -- --schema <databaseSchema><br>

The job is created successfully in the metastore. I queried MySQL and can see that it was created with the properties specified.

However, when I run the job, the incremental.last.value is not updated.

What can I do to start debugging this issue?

** UPDATE **

I dug a little deeper. I've enabled verbose logging and removed all --hive-import arguments.

I found the error that occurs when sqoop tries to update the metastore. Below is a dump of the log, starting when sqoop connects to the application servers.

The error is at the bottom:

17/11/11 07:46:18 INFO client.RMProxy: Connecting to ResourceManager at <location>
17/11/11 07:46:19 INFO client.AHSProxy: Connecting to Application History server at <location>
17/11/11 07:46:54 DEBUG db.DBConfiguration: Fetching password from job credentials store
17/11/11 07:46:54 INFO db.DBInputFormat: Using read commited transaction isolation
17/11/11 07:46:54 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([Id]), MAX([Id]) FROM [schema].[table] WHERE ( [Id] > -9999999 AND [Id] <= 500512 )
17/11/11 07:46:54 INFO db.IntegerSplitter: Split size: 125127; Num splits: 4 from: 1 to: 500512
17/11/11 07:46:54 DEBUG db.IntegerSplitter: Splits: [                           1 to                      500,512] into 4 parts
17/11/11 07:46:54 DEBUG db.IntegerSplitter:                            1
17/11/11 07:46:54 DEBUG db.IntegerSplitter:                      125,129
17/11/11 07:46:54 DEBUG db.IntegerSplitter:                      250,257
17/11/11 07:46:54 DEBUG db.IntegerSplitter:                      375,385
17/11/11 07:46:54 DEBUG db.IntegerSplitter:                      500,512
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 1' and upper bound '[Id] < 125129'
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 125129' and upper bound '[Id] < 250257'
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 250257' and upper bound '[Id] < 375385'
17/11/11 07:46:54 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '[Id] >= 375385' and upper bound '[Id] <= 500512'
17/11/11 07:46:55 INFO mapreduce.JobSubmitter: number of splits:4
17/11/11 07:46:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1509726871681_3374
17/11/11 07:46:58 INFO impl.YarnClientImpl: Submitted application application_1509726871681_3374
17/11/11 07:46:58 INFO mapreduce.Job: The url to track the job: <url>
17/11/11 07:46:58 INFO mapreduce.Job: Running job: job_1509726871681_3374
17/11/11 07:47:05 INFO mapreduce.Job: Job job_1509726871681_3374 running in uber mode : false
17/11/11 07:47:05 INFO mapreduce.Job:  map 0% reduce 0%
17/11/11 07:47:13 INFO mapreduce.Job:  map 25% reduce 0%
17/11/11 07:47:14 INFO mapreduce.Job:  map 75% reduce 0%
17/11/11 07:47:15 INFO mapreduce.Job:  map 100% reduce 0%
17/11/11 07:47:15 INFO mapreduce.Job: Job job_1509726871681_3374 completed successfully
17/11/11 07:47:15 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=674124
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=476
                HDFS: Number of bytes written=175543
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=44554
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=22277
                Total vcore-milliseconds taken by all map tasks=22277
                Total megabyte-milliseconds taken by all map tasks=34217472
        Map-Reduce Framework
                Map input records=863
                Map output records=863
                Input split bytes=476
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=367
                CPU time spent (ms)=15950
                Physical memory (bytes) snapshot=1008590848
                Virtual memory (bytes) snapshot=13268086784
                Total committed heap usage (bytes)=847773696
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=175543
17/11/11 07:47:15 INFO mapreduce.ImportJobBase: Transferred 171.4287 KB in 57.0946 seconds (3.0025 KB/sec)
17/11/11 07:47:15 INFO mapreduce.ImportJobBase: Retrieved 863 records.
17/11/11 07:47:15 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@9629756
17/11/11 07:47:15 INFO util.AppendUtils: Appending to directory table
17/11/11 07:47:15 INFO util.AppendUtils: Using found partition 4
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: _SUCCESS ignored
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00000 repartitioned to: part-m-00004
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00001 repartitioned to: part-m-00005
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00002 repartitioned to: part-m-00006
17/11/11 07:47:16 DEBUG util.AppendUtils: Filename: part-m-00003 repartitioned to: part-m-00007
17/11/11 07:47:16 DEBUG util.AppendUtils: Deleting temporary folder 99599ad0da924af5979d77bb3e4fdfa1_table
17/11/11 07:47:16 INFO tool.ImportTool: Saving incremental import state to the metastore
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_ROOT
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_ROOT
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.storage.version for version null
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage:  => 0
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Looking up property sqoop.hsqldb.job.info.table for version 0
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage:  => SQOOP_SESSIONS
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Checking for table: SQOOP_SESSIONS
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Found table: SQOOP_SESSIONS
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Creating job: myjob
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Job: myjob; Setting property sqoop.tool with class schema => import
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage: Job: myjob; Getting property sqoop.tool with class schema
17/11/11 07:47:16 DEBUG hsqldb.HsqldbJobStorage:  => import
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Closing connection
17/11/11 07:48:07 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Error communicating with database
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:426)
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.update(HsqldbJobStorage.java:445)
        at org.apache.sqoop.tool.ImportTool.saveIncrementalState(ImportTool.java:164)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:528)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:243)
        at org.apache.sqoop.tool.JobTool.run(JobTool.java:298)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4187)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2820)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2159)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2462)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2379)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2363)
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.setV0Property(HsqldbJobStorage.java:707)
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:391)
        ... 12 more
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Flushing current transaction
17/11/11 07:48:07 DEBUG hsqldb.HsqldbJobStorage: Closing connection

In short, there seems to be a lock on the transaction when sqoop is trying to update the mysql metastore: Lock wait timeout exceeded; try restarting transaction

It is locking on this statement:

UPDATE SQOOP_SESSIONS SET propval = 'import' WHERE job_name = 'myjob'

This doesn't seem like a valid statement, as there are 62 entries that are returned with that where clause.

Any idea what could be causing this, or where to start debugging further?

1 ACCEPTED SOLUTION

avatar

Finally figured this out with the help of this StackOverflow post! The default engine used when creating a table in MySQL is INNODB, which introduces transactions, thus the error. I simply recreated the SQOOP_SESSIONS table using MyISAM instead of INNODB.

View solution in original post

3 REPLIES 3

avatar
Rising Star

@Andrew Duncan Can you try running the sqoop job by placing all options related to incremental append before --connect option. Something like this -

sqoop job --create myjob -- import --append ---check-column <column> --incremental append --last-value -9999999 --connect "jdbc:sqlserver://<server>;database=<database>" -username <databaseUser> --password <databasePassword> --table <table> --target-dir "<someDataDirectory>" --hive-import --hive-database <hiveDatabase>--autoreset-to-one-mapper --hive-delims-replacement " " --outdir <outputdir> --bindir <bindir> -- -- --schema <databaseSchema>

Do let me know if it works.

avatar

This didn't solve the issue unfortunately.

avatar

Finally figured this out with the help of this StackOverflow post! The default engine used when creating a table in MySQL is INNODB, which introduces transactions, thus the error. I simply recreated the SQOOP_SESSIONS table using MyISAM instead of INNODB.