Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Oozie Sqoop job - cannot restore job

avatar
Expert Contributor

On HDP 2.3.4, using Oozie 4.2.0 and Sqoop 1.4.2, I'm trying to create a coordinator app that will execute sqoop jobs on a daily basis. I need the sqoop action to execute jobs because these are incremental imports.

I've configured `sqoop-site.xml` and started the `sqoop-metastore` and I'm able to create, list, and delete jobs via the command line but the workflow encounters the error: Cannot restore job: streamsummary_incremental

stderr

Sqoop command arguments :
             job
             --exec
             streamsummary_incremental
Fetching child yarn jobs
tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2
2016-02-03 09:46:47,193 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myhost>/<myIP>:8032
Child yarn jobs are found - 
=================================================================

>>> Invoking Sqoop command line now >>>

2241 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-02-03 09:46:47,404 WARN  [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2263 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6.2.3.4.0-3485
2016-02-03 09:46:47,426 INFO  [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage  - Cannot restore job: streamsummary_incremental
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage  - (No such job)
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job)
2553 [main] ERROR org.apache.sqoop.tool.JobTool  - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental
	at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
	at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
	at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
	at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
	at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

sqoop-site.xml

  <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>false</value>
    <description>If true, Sqoop will connect to a local metastore for job management when no other metastore arguments are provided.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:hsqldb:hsql://<myhost>:12345</value>
    <description>The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>SA</value>
    <description>The username to bind to the metastore.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value></value>
    <description>The password to bind to the metastore.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files. If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>   
  <property>
    <name>sqoop.metastore.server.port</name>
    <value>12345</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

workflow.xml

  <action name="sqoop-import-job">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${outputDir}"/>
      </prepare>
      <arg>job</arg>
      <arg>--exec</arg>
      <arg>${jobId}</arg>
    </sqoop>
    <ok to="hive-load"/>
    <error to="kill-sqoop"/>
  </action>

Additional info:

  • We're only running a single-node cluster.
  • Only Sqoop Client is installed.

I'm thinking maybe Oozie isn't able to connect to the metastore because we don't have sqoop server? Could anyone confirm this? If not that, could I have missed anything else?

Thanks!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

After some more testing, I finally resolved this issue by explicitly passing in the metastore URL in the workflow.xml like so:

<arg>job</arg>
<arg>--meta-connect</arg>
<arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg>
<arg>--exec</arg>
<arg>myjob</arg>

I'm not exactly sure, but I think that Oozie tries to connect to a local metastore because it doesn't have a copy of `sqoop-site.xml` and so it doesn't know the metastore url (even though I'm running on a single-node configuration)?

View solution in original post

10 REPLIES 10

avatar
Expert Contributor

Unfortunately, I can't help you there. It's not something I've tried. You should create a new question for that to get better visibility and hopefully attract answers.