Reply
Expert Contributor
Posts: 65
Registered: ‎11-24-2017

Sqoop with --hive-import on Kerberos based cluster

Hello everyone!

 

I have a Kerberos based cluster where I need to schedule several Oozie workflows. 

Normal sqoop and shell actions work fine, but I am having problems with sqoop actions that import data in hive tables (--hive-import). I've tried to use hcat credentials in the workflow but I got the following error in Oozie web console:

 

JA009: org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : TException while getting delegation token.. Cause : org.apache.thrift.transport.TTransportException

This is the workflow:

 

<workflow-app xmlns="uri:oozie:workflow:0.5" name="wf-hive">

    <credentials>
        <credential name='hcat_credentials' type='hcat'>
            <property>
                <name>hcat.metastore.uri</name>
                <value>thrift://trmas-xxxxx.yyyy.local:9083</value>
            </property>
            <property>
                <name>hcat.metastore.principal</name>
                <value>hive/_HOST@XXXXX.LOCAL</value>
            </property>
        </credential>
    </credentials>

    <start to="HIVE_IMPORT"/>
    
    <action name="HIVE_IMPORT" cred="hcat_credentials">
        <sqoop
            xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="hdfs://trmas-6b8bc78c.xxxxx.yyyy:8020/user/myuser/mytable"/>
            </prepare>
            <job-xml>hdfs://trmas-6b8bc78c.xxxxx.yyyy:8020/user/myuser/hive-site.xml</job-xml>
            <arg>import</arg>
            <arg>--connection-manager</arg>
            <arg>com.cloudera.connector.teradata.TeradataManager</arg>
            <arg>--connect</arg>
            <arg>jdbc:teradata://x.y.z.w/DATABASE=DLT01V</arg>
            <arg>--username</arg>
            <arg>username</arg>
            <arg>--password</arg>
            <arg>password</arg>
            <arg>--query</arg>
            <arg>SELECT * from mytable WHERE $CONDITIONS</arg>
            <arg>--split-by</arg>
            <arg>__temp__</arg>
            <arg>--hive-import</arg>
            <arg>--hive-table</arg>
            <arg>dbtest.mytable</arg>
            <arg>--num-mappers</arg>
            <arg>1</arg>
            <arg>--target-dir</arg>
            <arg>hdfs://trmas-6b8bc78c.xxxxx.yyyy:8020/user/myuser/mytable</arg>
        </sqoop>
        <ok to="END_NODE"/>
        <error to="KILL_NODE"/>
    </action>

    <kill name="KILL_NODE">
        <message>${wf:errorMessage(wf:lastErrorNode())}</message>
    </kill>

    <end name="END_NODE"/>

</workflow-app>

 

I am not sure how to valorize hcat.metastore.uri and hcat.metastore.principal properties...I have used the values of respectfully hive.metastore.uris and hive.metastore.kerberos.principal from the hive-site.xml, is this correct?

 

Unfortunately I can't use Sqoop2 and its support for HiveServer2 authentication because the Cloudera Teradata connector does not support Sqoop2 yet, thus I think I should authenticate with the HCat credentials.

 

Please anyone can provide any help on this one? 

 

Cloudera Employee
Posts: 294
Registered: ‎03-23-2015

Re: Sqoop with --hive-import on Kerberos based cluster

You said that you got the error from Oozie web console, have you checked the actual launcher job that failed with exact error message? I am not sure the error you posted is actually related to the job you ran.

When Oozie runs a workflow, it starts a launcher job, which itself is a MR job, and the launcher job is responsible for starting the actual actions, in your case is Sqoop1 action. While running the Sqoop action in the launcher, launcher will capture the outputs of Sqoop job into its mapper's stdout, so you need to find the launcher from JobHistory server web UI and locate the full logs to see the exact error of the failed job. Hope that makes sense. Once find it, please share the error message.

Regarding Sqoop2, it is deprecated since CDH5.9 and will be removed in CDH6, so you should not go with that option regardless.

Cheers
Announcements