Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Sqoop with --hive-import on Kerberos based cluster

avatar
Expert Contributor

Hello everyone!

 

I have a Kerberos based cluster where I need to schedule several Oozie workflows. 

Normal sqoop and shell actions work fine, but I am having problems with sqoop actions that import data in hive tables (--hive-import). I've tried to use hcat credentials in the workflow but I got the following error in Oozie web console:

 

JA009: org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : TException while getting delegation token.. Cause : org.apache.thrift.transport.TTransportException

This is the workflow:

 

<workflow-app xmlns="uri:oozie:workflow:0.5" name="wf-hive">

    <credentials>
        <credential name='hcat_credentials' type='hcat'>
            <property>
                <name>hcat.metastore.uri</name>
                <value>thrift://trmas-xxxxx.yyyy.local:9083</value>
            </property>
            <property>
                <name>hcat.metastore.principal</name>
                <value>hive/_HOST@XXXXX.LOCAL</value>
            </property>
        </credential>
    </credentials>

    <start to="HIVE_IMPORT"/>
    
    <action name="HIVE_IMPORT" cred="hcat_credentials">
        <sqoop
            xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="hdfs://trmas-6b8bc78c.xxxxx.yyyy:8020/user/myuser/mytable"/>
            </prepare>
            <job-xml>hdfs://trmas-6b8bc78c.xxxxx.yyyy:8020/user/myuser/hive-site.xml</job-xml>
            <arg>import</arg>
            <arg>--connection-manager</arg>
            <arg>com.cloudera.connector.teradata.TeradataManager</arg>
            <arg>--connect</arg>
            <arg>jdbc:teradata://x.y.z.w/DATABASE=DLT01V</arg>
            <arg>--username</arg>
            <arg>username</arg>
            <arg>--password</arg>
            <arg>password</arg>
            <arg>--query</arg>
            <arg>SELECT * from mytable WHERE $CONDITIONS</arg>
            <arg>--split-by</arg>
            <arg>__temp__</arg>
            <arg>--hive-import</arg>
            <arg>--hive-table</arg>
            <arg>dbtest.mytable</arg>
            <arg>--num-mappers</arg>
            <arg>1</arg>
            <arg>--target-dir</arg>
            <arg>hdfs://trmas-6b8bc78c.xxxxx.yyyy:8020/user/myuser/mytable</arg>
        </sqoop>
        <ok to="END_NODE"/>
        <error to="KILL_NODE"/>
    </action>

    <kill name="KILL_NODE">
        <message>${wf:errorMessage(wf:lastErrorNode())}</message>
    </kill>

    <end name="END_NODE"/>

</workflow-app>

 

I am not sure how to valorize hcat.metastore.uri and hcat.metastore.principal properties...I have used the values of respectfully hive.metastore.uris and hive.metastore.kerberos.principal from the hive-site.xml, is this correct?

 

Unfortunately I can't use Sqoop2 and its support for HiveServer2 authentication because the Cloudera Teradata connector does not support Sqoop2 yet, thus I think I should authenticate with the HCat credentials.

 

Please anyone can provide any help on this one? 

 

Who agreed with this topic