Created 05-26-2016 08:41 AM
Hi All,
I am trying to do sqoop import with where condition/free form query with oozie and it is failing.
my oozie action looks like :
<action name="sqoop-action">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${deleteHdfsPath}"/>
</prepare>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>/usr/hdp/current/hive-client/conf/hive-site.xml</value>
</property>
</configuration>
<command>${command}</command>
</sqoop>
<ok to="ok" />
<error to="kill" />
</action>In properties file for command argument looks like this :
import --connect jdbc:mysql://<>:3306/test --username hive --password hive --query "SELECT * FROM _table WHERE \$CONDITIONS AND _id > 0 AND _id <= 1000000" --split-by _id --fields-terminated-by \| --target-dir /apps/hive/warehouse/hive_table
and I have tried with below also.
import --connect jdbc:mysql://ip-172-31-5-150.ec2.internal:3306/test --username hive --password hive --table _table --where "_id > 0 AND _id <= 1000000" --fields-terminated-by \| --target-dir /apps/hive/warehouse/hive_table
I am getting below error with both the way, it saying Unrecognized argument.
Sqoop command arguments :
import
--connect
jdbc:mysql://<>:3306/test
--username
hive
--password
hive
--query
SELECT
*
FROM
_table
WHERE
$CONDITIONS
AND
_id
>
0
AND
_id
<=
1000000
--split-by
_id
--fields-terminated-by
|
--target-dir
/apps/hive/warehouse/hive_table
Fetching child yarn jobs
tag id : oozie-fcc0762084b58ea3c408ef7887cc26a7
2016-05-25 23:58:33,814 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /172.31.5.150:8050
Child yarn jobs are found -
=================================================================
>>> Invoking Sqoop command line now >>>
3601 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-05-25 23:58:34,126 WARN [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
3642 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6.2.4.0.0-169
2016-05-25 23:58:34,167 INFO [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.4.0.0-169
3667 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
2016-05-25 23:58:34,192 WARN [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1026)) - Setting your password on the command-line is insecure. Consider using -P instead.
3668 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Error parsing arguments for import:
2016-05-25 23:58:34,193 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(304)) - Error parsing arguments for import:
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: *
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: *
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: FROM
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: FROM
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: _product
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _product
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: WHERE
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: WHERE
3670 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: $CONDITIONS
2016-05-25 23:58:34,195 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: $CONDITIONS
3670 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: AND
2016-05-25 23:58:34,195 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: AND
3671 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: _id
2016-05-25 23:58:34,196 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _id
3671 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: >
2016-05-25 23:58:34,196 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: >
3671 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: 0
2016-05-25 23:58:34,196 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: 1000000
3673 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: AND
2016-05-25 23:58:34,198 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: AND
3673 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: _id
2016-05-25 23:58:34,198 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _id
3677 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: <=
2016-05-25 23:58:34,202 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: <=
3677 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: 1000000
2016-05-25 23:58:34,202 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: 2000000
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --split-by
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: --split-by
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: _id
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _id
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --fields-terminated-by
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: --fields-terminated-by
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: |
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: |
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: --target-dir
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: --target-dir
3679 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool - Unrecognized argument: /apps/hive/warehouse/n_product_2
2016-05-25 23:58:34,204 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: /apps/hive/warehouse/n_product_2
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<Sqoop command working fine in both the case, But running with oozie not working. All through simple sqoop action is working(without query - whole table fetch) fine with oozie.. but some how free form query is not working with oozie.
Let me know, if I am missing something.
Thanks in advance,
Ankit
Created 05-26-2016 10:11 PM
Hi @Ankit A
In this case you need to set your "import *" statement into set of arguments as mentioned by @Christine so that oozie can pass it to sqoop.
Here is an example workflow. Link
<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-freeform-wf">
<start to="sqoop-freeform-node"/>
<action name="sqoop-freeform-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:hsqldb:file:db.hsqldb</arg>
<arg>--username</arg>
<arg>sa</arg>
<arg>--password</arg>
<arg></arg>
<arg>--verbose</arg>
<arg>--query</arg>
<arg>select TT.I, TT.S from TT where $CONDITIONS</arg>
<arg>--target-dir</arg>
<arg>/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform</arg>
<arg>-m</arg>
<arg>1</arg>
<file>db.hsqldb.properties#db.hsqldb.properties</file>
<file>db.hsqldb.script#db.hsqldb.script</file>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop free form failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Created 05-26-2016 08:04 PM
Try setting the sqoop command up with arguments (nothing should be in command). Properties and values for each element should be entered as separate arguments.
arg: import arg: --connect arg: jdbc:mysql.... arg: --username arg: [username] arg: --password-file arg: [password file] arg: --query arg: select ..... arg: --target-dir arg: [target]
Created 05-29-2016 01:02 PM
Thanks that worked
Created 05-26-2016 10:11 PM
Hi @Ankit A
In this case you need to set your "import *" statement into set of arguments as mentioned by @Christine so that oozie can pass it to sqoop.
Here is an example workflow. Link
<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-freeform-wf">
<start to="sqoop-freeform-node"/>
<action name="sqoop-freeform-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:hsqldb:file:db.hsqldb</arg>
<arg>--username</arg>
<arg>sa</arg>
<arg>--password</arg>
<arg></arg>
<arg>--verbose</arg>
<arg>--query</arg>
<arg>select TT.I, TT.S from TT where $CONDITIONS</arg>
<arg>--target-dir</arg>
<arg>/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform</arg>
<arg>-m</arg>
<arg>1</arg>
<file>db.hsqldb.properties#db.hsqldb.properties</file>
<file>db.hsqldb.script#db.hsqldb.script</file>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop free form failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Created 05-29-2016 01:02 PM
Thanks, That worked