Member since
12-10-2015
43
Posts
39
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4446 | 02-04-2016 01:37 AM | |
15014 | 02-03-2016 02:03 AM | |
6827 | 01-26-2016 08:00 AM |
05-10-2016
01:13 AM
Unfortunately, I can't help you there. It's not something I've tried. You should create a new question for that to get better visibility and hopefully attract answers.
... View more
02-09-2016
06:17 AM
2 Kudos
Hi @Shigeru Takehara, the jdbc driver was in the Oozie sharelib. Also, the import was working fine - I see the data within HDFS. It's just the loading into Hive that fails. That's why I just opted to break the import into two separate steps.
... View more
02-04-2016
02:24 AM
I find that odd and such a shame. It seems like a metastore service is something that would be an important requirement in order to run incremental imports as jobs and then called from a coordinator app. I did see this blog describing how to manually set-up mysql to work as the metastore, though I never tried it myself. I wonder if this would be preferable to running `sqoop metastore &`?
... View more
02-04-2016
01:49 AM
Thanks for all of your help so far, Artem. I do have a question regarding the metastore though - hopefully you could shed some light on this for me. So far, I've only been able to start the metastore via command-line and
it runs in foreground. This is of course unacceptable in a fully
automated process. I'm assuming there's a way to run this as a service
instead, and for that I would need sqoop server?
... View more
02-04-2016
01:37 AM
2 Kudos
After some more testing, I finally resolved this issue by explicitly passing in the metastore URL in the workflow.xml like so: <arg>job</arg>
<arg>--meta-connect</arg>
<arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg>
<arg>--exec</arg>
<arg>myjob</arg> I'm not exactly sure, but I think that Oozie tries to connect to a local metastore because it doesn't have a copy of `sqoop-site.xml` and so it doesn't know the metastore url (even though I'm running on a single-node configuration)?
... View more
02-04-2016
12:35 AM
@Artem Ervits our HDP is running on a single-node configuration, and I am able to list the sqoop jobs from this node
... View more
02-04-2016
12:33 AM
2 Kudos
@Shigeru Takehara I did try adding hive-site.xml and placed it in the root of the workflow directory on HDFS, but I was running into the error and the error message in the logs is this: ERROR [main] tool.ImportTool (ImportTool.java:run(613)) - Encountered IOException running import job: java.io.IOException: Hive exited with status 1 I eventually had to go with my workaround because I couldn't get hive import to work and I had deadlines to meet. I'd still like to try and get hive import to work though
... View more
02-03-2016
04:01 AM
1 Kudo
On HDP 2.3.4, using Oozie 4.2.0 and Sqoop 1.4.2, I'm trying to create a coordinator app that will execute sqoop jobs on a daily basis. I need the sqoop action to execute jobs because these are incremental imports. I've configured `sqoop-site.xml` and started the `sqoop-metastore` and I'm able to create, list, and delete jobs via the command line but the workflow encounters the error: Cannot restore job: streamsummary_incremental
stderr
Sqoop command arguments :
job
--exec
streamsummary_incremental
Fetching child yarn jobs
tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2
2016-02-03 09:46:47,193 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myhost>/<myIP>:8032
Child yarn jobs are found -
=================================================================
>>> Invoking Sqoop command line now >>>
2241 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-02-03 09:46:47,404 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2263 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.3.4.0-3485
2016-02-03 09:46:47,426 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - Cannot restore job: streamsummary_incremental
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - (No such job)
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job)
2553 [main] ERROR org.apache.sqoop.tool.JobTool - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental
at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
sqoop-site.xml <property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>false</value>
<description>If true, Sqoop will connect to a local metastore for job management when no other metastore arguments are provided.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:hsqldb:hsql://<myhost>:12345</value>
<description>The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>SA</value>
<description>The username to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value></value>
<description>The password to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.server.location</name>
<value>/tmp/sqoop-metastore/shared.db</value>
<description>Path to the shared metastore database files. If this is not set, it will be placed in ~/.sqoop/.
</description>
</property>
<property>
<name>sqoop.metastore.server.port</name>
<value>12345</value>
<description>Port that this metastore should listen on.
</description>
</property>
workflow.xml <action name="sqoop-import-job">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputDir}"/>
</prepare>
<arg>job</arg>
<arg>--exec</arg>
<arg>${jobId}</arg>
</sqoop>
<ok to="hive-load"/>
<error to="kill-sqoop"/>
</action>
Additional info: We're only running a single-node cluster. Only Sqoop Client is
installed. I'm thinking maybe Oozie isn't able to connect to the metastore because we don't have sqoop server? Could anyone confirm this? If not that, could I have missed anything else? Thanks!
... View more
Labels:
02-03-2016
02:03 AM
2 Kudos
At the same time that I was getting this issue, I was also dealing with a network issue when trying to issue Sqoop commands via CLI. Although the network issue was resolved and I stopped seeing this IOException, I kept running into new errors that I never managed to resolve. In the end, I decided to work around it by breaking the hive import into a 2-step workflow:
sqoop action to import into HDFS hive action to load data from HDFS into hive UPDATE: It turns out that the "new errors" was because the "yarn" user doesn't belong to the "hdfs" group and so couldn't perform the hive-import part. Adding this use to the group allows me now to use hive-import in my worfklows instead of the 2-step workflow I used before.
... View more