Support Questions

Find answers, ask questions, and share your expertise

Sqoop free from query with oozie not working

avatar
Explorer

Hi All,

I am trying to do sqoop import with where condition/free form query with oozie and it is failing.

my oozie action looks like :

<action name="sqoop-action">
   <sqoop xmlns="uri:oozie:sqoop-action:0.2">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
         <delete path="${deleteHdfsPath}"/>
      </prepare>
      <configuration>
         <property>
            <name>oozie.hive.defaults</name>
            <value>/usr/hdp/current/hive-client/conf/hive-site.xml</value>
         </property>
      </configuration>
      <command>${command}</command>
   </sqoop>
   <ok to="ok" />
   <error to="kill" />
</action>

In properties file for command argument looks like this :

import --connect jdbc:mysql://<>:3306/test --username hive --password hive --query "SELECT * FROM _table WHERE \$CONDITIONS AND _id > 0 AND _id <= 1000000" --split-by _id --fields-terminated-by \| --target-dir /apps/hive/warehouse/hive_table

and I have tried with below also.

import --connect jdbc:mysql://ip-172-31-5-150.ec2.internal:3306/test --username hive --password hive --table _table --where "_id > 0 AND _id <= 1000000" --fields-terminated-by \| --target-dir /apps/hive/warehouse/hive_table

I am getting below error with both the way, it saying Unrecognized argument.

Sqoop command arguments :
             import
             --connect
             jdbc:mysql://<>:3306/test
             --username
             hive
             --password
             hive
             --query
             SELECT
             *
             FROM
             _table
             WHERE
             $CONDITIONS
             AND
             _id
             >
             0
             AND
             _id
             <=
             1000000
             --split-by
             _id
             --fields-terminated-by
             |
             --target-dir
             /apps/hive/warehouse/hive_table
Fetching child yarn jobs
tag id : oozie-fcc0762084b58ea3c408ef7887cc26a7
2016-05-25 23:58:33,814 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at /172.31.5.150:8050
Child yarn jobs are found - 
=================================================================
>>> Invoking Sqoop command line now >>>
3601 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-05-25 23:58:34,126 WARN  [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
3642 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6.2.4.0.0-169
2016-05-25 23:58:34,167 INFO  [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.4.0.0-169
3667 [main] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
2016-05-25 23:58:34,192 WARN  [main] tool.BaseSqoopTool (BaseSqoopTool.java:applyCredentialsOptions(1026)) - Setting your password on the command-line is insecure. Consider using -P instead.
3668 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Error parsing arguments for import:
2016-05-25 23:58:34,193 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(304)) - Error parsing arguments for import:
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: *
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: *
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: FROM
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: FROM
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: _product
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _product
3669 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: WHERE
2016-05-25 23:58:34,194 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: WHERE
3670 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: $CONDITIONS
2016-05-25 23:58:34,195 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: $CONDITIONS
3670 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: AND
2016-05-25 23:58:34,195 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: AND
3671 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: _id
2016-05-25 23:58:34,196 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _id
3671 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: >
2016-05-25 23:58:34,196 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: >
3671 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: 0
2016-05-25 23:58:34,196 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: 1000000
3673 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: AND
2016-05-25 23:58:34,198 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: AND
3673 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: _id
2016-05-25 23:58:34,198 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _id
3677 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: <=
2016-05-25 23:58:34,202 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: <=
3677 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: 1000000
2016-05-25 23:58:34,202 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: 2000000
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: --split-by
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: --split-by
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: _id
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: _id
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: --fields-terminated-by
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: --fields-terminated-by
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: |
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: |
3678 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: --target-dir
2016-05-25 23:58:34,203 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: --target-dir
3679 [main] ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: /apps/hive/warehouse/n_product_2
2016-05-25 23:58:34,204 ERROR [main] tool.BaseSqoopTool (BaseSqoopTool.java:hasUnrecognizedArgs(307)) - Unrecognized argument: /apps/hive/warehouse/n_product_2
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<

Sqoop command working fine in both the case, But running with oozie not working. All through simple sqoop action is working(without query - whole table fetch) fine with oozie.. but some how free form query is not working with oozie.

Let me know, if I am missing something.

Thanks in advance,

Ankit

1 ACCEPTED SOLUTION

avatar
Super Guru

Hi @Ankit A

In this case you need to set your "import *" statement into set of arguments as mentioned by @Christine so that oozie can pass it to sqoop.

Here is an example workflow. Link

<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-freeform-wf">
    <start to="sqoop-freeform-node"/>

    <action name="sqoop-freeform-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform"/>
                <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <arg>import</arg>
            <arg>--connect</arg>
            <arg>jdbc:hsqldb:file:db.hsqldb</arg>
            <arg>--username</arg>
            <arg>sa</arg>
            <arg>--password</arg>
            <arg></arg>
            <arg>--verbose</arg>
            <arg>--query</arg>
            <arg>select TT.I, TT.S from TT where $CONDITIONS</arg>
            <arg>--target-dir</arg>
            <arg>/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform</arg>
            <arg>-m</arg>
            <arg>1</arg>
            <file>db.hsqldb.properties#db.hsqldb.properties</file>
            <file>db.hsqldb.script#db.hsqldb.script</file>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>


    <kill name="fail">
        <message>Sqoop free form failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>



View solution in original post

4 REPLIES 4

avatar
Rising Star

Try setting the sqoop command up with arguments (nothing should be in command). Properties and values for each element should be entered as separate arguments.

arg:  import
arg:  --connect
arg:  jdbc:mysql....
arg:  --username
arg:  [username]
arg:  --password-file
arg:  [password file]
arg:  --query
arg:  select .....
arg:  --target-dir
arg:  [target]

avatar
Explorer

Thanks that worked

avatar
Super Guru

Hi @Ankit A

In this case you need to set your "import *" statement into set of arguments as mentioned by @Christine so that oozie can pass it to sqoop.

Here is an example workflow. Link

<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-freeform-wf">
    <start to="sqoop-freeform-node"/>

    <action name="sqoop-freeform-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform"/>
                <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <arg>import</arg>
            <arg>--connect</arg>
            <arg>jdbc:hsqldb:file:db.hsqldb</arg>
            <arg>--username</arg>
            <arg>sa</arg>
            <arg>--password</arg>
            <arg></arg>
            <arg>--verbose</arg>
            <arg>--query</arg>
            <arg>select TT.I, TT.S from TT where $CONDITIONS</arg>
            <arg>--target-dir</arg>
            <arg>/user/${wf:user()}/${examplesRoot}/output-data/sqoop-freeform</arg>
            <arg>-m</arg>
            <arg>1</arg>
            <file>db.hsqldb.properties#db.hsqldb.properties</file>
            <file>db.hsqldb.script#db.hsqldb.script</file>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>


    <kill name="fail">
        <message>Sqoop free form failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>



avatar
Explorer

Thanks, That worked