About latorres

latorres · ‎05-10-2016

Unfortunately, I can't help you there. It's not something I've tried. You should create a new question for that to get better visibility and hopefully attract answers.

latorres · ‎02-10-2016

This fixed it

latorres · ‎02-09-2016

Hi @Shigeru Takehara, the jdbc driver was in the Oozie sharelib. Also, the import was working fine - I see the data within HDFS. It's just the loading into Hive that fails. That's why I just opted to break the import into two separate steps.

latorres · ‎02-04-2016

I find that odd and such a shame. It seems like a metastore service is something that would be an important requirement in order to run incremental imports as jobs and then called from a coordinator app. I did see this blog describing how to manually set-up mysql to work as the metastore, though I never tried it myself. I wonder if this would be preferable to running `sqoop metastore &`?

latorres · ‎02-04-2016

Thanks for all of your help so far, Artem. I do have a question regarding the metastore though - hopefully you could shed some light on this for me. So far, I've only been able to start the metastore via command-line and it runs in foreground. This is of course unacceptable in a fully automated process. I'm assuming there's a way to run this as a service instead, and for that I would need sqoop server?

latorres · ‎02-04-2016

After some more testing, I finally resolved this issue by explicitly passing in the metastore URL in the workflow.xml like so: <arg>job</arg> <arg>--meta-connect</arg> <arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg> <arg>--exec</arg> <arg>myjob</arg> I'm not exactly sure, but I think that Oozie tries to connect to a local metastore because it doesn't have a copy of `sqoop-site.xml` and so it doesn't know the metastore url (even though I'm running on a single-node configuration)?

latorres · ‎02-04-2016

@Artem Ervits our HDP is running on a single-node configuration, and I am able to list the sqoop jobs from this node

latorres · ‎02-04-2016

@Shigeru Takehara I did try adding hive-site.xml and placed it in the root of the workflow directory on HDFS, but I was running into the error and the error message in the logs is this: ERROR [main] tool.ImportTool (ImportTool.java:run(613)) - Encountered IOException running import job: java.io.IOException: Hive exited with status 1 I eventually had to go with my workaround because I couldn't get hive import to work and I had deadlines to meet. I'd still like to try and get hive import to work though

latorres · ‎02-03-2016

On HDP 2.3.4, using Oozie 4.2.0 and Sqoop 1.4.2, I'm trying to create a coordinator app that will execute sqoop jobs on a daily basis. I need the sqoop action to execute jobs because these are incremental imports. I've configured `sqoop-site.xml` and started the `sqoop-metastore` and I'm able to create, list, and delete jobs via the command line but the workflow encounters the error: Cannot restore job: streamsummary_incremental stderr Sqoop command arguments : job --exec streamsummary_incremental Fetching child yarn jobs tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2 2016-02-03 09:46:47,193 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myhost>/<myIP>:8032 Child yarn jobs are found - ================================================================= >>> Invoking Sqoop command line now >>> 2241 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 2016-02-03 09:46:47,404 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 2263 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.3.4.0-3485 2016-02-03 09:46:47,426 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485 2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - Cannot restore job: streamsummary_incremental 2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental 2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage - (No such job) 2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job) 2553 [main] ERROR org.apache.sqoop.tool.JobTool - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198) at org.apache.sqoop.tool.JobTool.run(JobTool.java:283) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244) at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197) at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47) at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) sqoop-site.xml <property> <name>sqoop.metastore.client.enable.autoconnect</name> <value>false</value> <description>If true, Sqoop will connect to a local metastore for job management when no other metastore arguments are provided. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.url</name> <value>jdbc:hsqldb:hsql://<myhost>:12345</value> <description>The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.username</name> <value>SA</value> <description>The username to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.password</name> <value></value> <description>The password to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.server.location</name> <value>/tmp/sqoop-metastore/shared.db</value> <description>Path to the shared metastore database files. If this is not set, it will be placed in ~/.sqoop/. </description> </property> <property> <name>sqoop.metastore.server.port</name> <value>12345</value> <description>Port that this metastore should listen on. </description> </property> workflow.xml <action name="sqoop-import-job"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${outputDir}"/> </prepare> <arg>job</arg> <arg>--exec</arg> <arg>${jobId}</arg> </sqoop> <ok to="hive-load"/> <error to="kill-sqoop"/> </action> Additional info: We're only running a single-node cluster. Only Sqoop Client is installed. I'm thinking maybe Oozie isn't able to connect to the metastore because we don't have sqoop server? Could anyone confirm this? If not that, could I have missed anything else? Thanks!

latorres · ‎02-03-2016

At the same time that I was getting this issue, I was also dealing with a network issue when trying to issue Sqoop commands via CLI. Although the network issue was resolved and I stopped seeing this IOException, I kept running into new errors that I never managed to resolve. In the end, I decided to work around it by breaking the hive import into a 2-step workflow: sqoop action to import into HDFS hive action to load data from HDFS into hive UPDATE: It turns out that the "new errors" was because the "yarn" user doesn't belong to the "hdfs" group and so couldn't perform the hive-import part. Adding this use to the group allows me now to use hive-import in my worfklows instead of the 2-step workflow I used before.

Online	Offline
Last Visited	‎08-08-2016 12:57 AM

Member Since	‎12-10-2015 08:19 AM
Last Visited	‎08-08-2016 12:57 AM
Posts	43
Kudos received	39

Cloudera Community

Re: Oozie Sqoop job - cannot restore job

Re: Oozie Sqoop action throwing java.io.IOExceptio...

Re: Sqoop: Importing from SQL Server throwing “The...

Re: Oozie Sqoop job - cannot restore job

Re: Oozie SparkAction failing

Re: Oozie Sqoop action throwing java.io.IOExceptio...

Re: Oozie Sqoop job - cannot restore job

Re: Oozie Sqoop job - cannot restore job

Re: Oozie Sqoop job - cannot restore job

Re: Oozie Sqoop job - cannot restore job

Re: Oozie Sqoop action throwing java.io.IOExceptio...

Oozie Sqoop job - cannot restore job

Re: Oozie Sqoop action throwing java.io.IOExceptio...