Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Concurrent access to the Sqoop metastore

avatar
Contributor

EDITED:

Just in case somebody ends up in this question trying to set MySQL as the Sqoop metastore, the instructions to do that are described in this post:

https://community.hortonworks.com/articles/55937/using-sqoop-with-mysql-as-metastore.html

Hi,

I have a single Hortonworks node, using Sqoop to import some tables from an SQLServer database. Whenever I try to execute two Oozie workflows simultaneously (each of which performs a Sqoop import job), I get an exception related to the hsqldb metastore.

3606 [main] ERROR org.apache.sqoop.tool.JobTool  - I/O error performing job operation: java.io.IOException: Exception creating SQL connection
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:216)
        at org.apache.sqoop.metastore.hsqldb.AutoHsqldbStorage.open(AutoHsqldbStorage.java:112)
        at org.apache.sqoop.tool.JobTool.run(JobTool.java:274)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
        at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
        at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
        at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.sql.SQLException: The database is already in use by another process: org.hsqldb.persist.NIOLockFile@9a5ed4be[file =/home/badevelop/.sqoop/metastore.db.lck, exists=true, locked=false, valid=false, fl =null]: java.lang.Exception: checkHeartbeat(): lock file [/home/david/.sqoop/metastore.db.lck] is presumably locked by another process.
        at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
        at org.hsqldb.jdbc.jdbcConnection.<init>(Unknown Source)
        at org.hsqldb.jdbcDriver.getConnection(Unknown Source)
        at org.hsqldb.jdbcDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:247)
        at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:176)
        ... 25 more

I assume that when one of the processes gets the .lock file in order to access to the hsqldb database, it prevents the other from getting into the metastore, raising the exception. Is this behaviour normal using the default Sqoop configuration? (I haven't modified any config property in the sqoop-site.xml file or in the metastore.db.* scripts). Since the metastore is going to be used by many Oozie workflows, I don't understand why this message seems to imply that the access to the database has to be sequential.

Will changing the metastore to another database (for instance MySQL) solve the problem?

Any hint would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar
Master Mentor

Yes you need to move to non-default database to avoid such issues. derby is not meant for concurrent access.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

Yes you need to move to non-default database to avoid such issues. derby is not meant for concurrent access.

avatar
New Contributor

Would implement an own database to handle sqoop metastore have any potential trade off (beyond implementation itself) or everything that sqoop stores and retrieves from the derby default metastore can be implemented on a ad hoc one?

Do you think a MySql can be a good choice to implement a custom sqoop metastore or would you rather pick any other database format?

avatar
Contributor

Thanks, Artem.

As I read from another question you answered, you were not able to use MySql as a metastore (https://community.hortonworks.com/questions/22152/sqoop-metastore-usage-supported-databases-backup.h...

Do you think it is worth having a try with MySQL, or would you use another database? Maybe Postgres?

avatar
Master Mentor
@Jaime

the database flavor has nothing to do with it, it's a matter of Sqoop having that functionality, which at this moment it does not.