Created 02-03-2016 04:01 AM
On HDP 2.3.4, using Oozie 4.2.0 and Sqoop 1.4.2, I'm trying to create a coordinator app that will execute sqoop jobs on a daily basis. I need the sqoop action to execute jobs because these are incremental imports.
I've configured `sqoop-site.xml` and started the `sqoop-metastore` and I'm able to create, list, and delete jobs via the command line but the workflow encounters the error: Cannot restore job: streamsummary_incremental
stderr
Sqoop command arguments : job --exec streamsummary_incremental Fetching child yarn jobs tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2 2016-02-03 09:46:47,193 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myhost>/<myIP>:8032 Child yarn jobs are found - ================================================================= >>> Invoking Sqoop command line now >>> 2241 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 2016-02-03 09:46:47,404 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 2263 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.3.4.0-3485 2016-02-03 09:46:47,426 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485 2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - Cannot restore job: streamsummary_incremental 2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental 2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - (No such job) 2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job) 2553 [main] ERROR org.apache.sqoop.tool.JobTool - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198) at org.apache.sqoop.tool.JobTool.run(JobTool.java:283) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197) at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47) at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
sqoop-site.xml
<property> <name>sqoop.metastore.client.enable.autoconnect</name> <value>false</value> <description>If true, Sqoop will connect to a local metastore for job management when no other metastore arguments are provided. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.url</name> <value>jdbc:hsqldb:hsql://<myhost>:12345</value> <description>The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.username</name> <value>SA</value> <description>The username to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.password</name> <value></value> <description>The password to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.server.location</name> <value>/tmp/sqoop-metastore/shared.db</value> <description>Path to the shared metastore database files. If this is not set, it will be placed in ~/.sqoop/. </description> </property> <property> <name>sqoop.metastore.server.port</name> <value>12345</value> <description>Port that this metastore should listen on. </description> </property>
workflow.xml
<action name="sqoop-import-job"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${outputDir}"/> </prepare> <arg>job</arg> <arg>--exec</arg> <arg>${jobId}</arg> </sqoop> <ok to="hive-load"/> <error to="kill-sqoop"/> </action>
Additional info:
I'm thinking maybe Oozie isn't able to connect to the metastore because we don't have sqoop server? Could anyone confirm this? If not that, could I have missed anything else?
Thanks!
Created 02-04-2016 01:37 AM
After some more testing, I finally resolved this issue by explicitly passing in the metastore URL in the workflow.xml like so:
<arg>job</arg> <arg>--meta-connect</arg> <arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg> <arg>--exec</arg> <arg>myjob</arg>
I'm not exactly sure, but I think that Oozie tries to connect to a local metastore because it doesn't have a copy of `sqoop-site.xml` and so it doesn't know the metastore url (even though I'm running on a single-node configuration)?
Created 05-10-2016 01:13 AM
Unfortunately, I can't help you there. It's not something I've tried. You should create a new question for that to get better visibility and hopefully attract answers.