Support Questions
Find answers, ask questions, and share your expertise

Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

Hi ,

I have sqoop job setup properly in Kerberized cluster and if I execute the job using Sqoop job -exec myjobName then it works fine. But when I use the following command to execute oozie workflow which the above Sqoop job as action it is failing

oozie job -oozie http://myoozieServer:11000/oozie -config workflow.properties -run

The error captured using the job history is :

246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1549)
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	... 13 more
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	... 13 more
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316)
246498 [Thread-19] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - 	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316)

and in HiveMetastore log file I noticed the same error

java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Peer indicated failure: GSS initiate failed

Now I have updated workflow.xml using the following link

https://prodlife.wordpress.com/2013/11/22/using-oozie-in-kerberized-cluster/

<credentials>

<credential name='hive_credentials' type='hcat'>

<property> <name>hcat.metastore.uri</name> <value>thrift://metastore_server:9083</value>

</property> <property>

<name>hcat.metastore.principal</name> <value>hive/_HOST@KERBDOM.COM</value>

</property> </credential>

</credentials>

and <action name="actionAccountPartnerService" cred="hive_credentials">

with the above configuration I am getting the following error

java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Invalid status -128 at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:609) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:606) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1704) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:606) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException: Invalid status -128 at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 10 more

can any one help.

Thank you

ram

15 REPLIES 15

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Expert Contributor

@Ramakrishna Pratapa

From the exception, it looks like that while trying to run the workflow.xml on secure cluster HCat credential profile has not been set properly. Can you please have a look towards the credential profile and ensure that key-value are correct.

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

Thank you for your help. I followed the following link

https://oozie.apache.org/docs/4.2.0/DG_ActionAuthentication.html#Oozie_Server_Configuration

 <workflow-app xmlns='uri:oozie:workflow:0.4' name='pig-wf'>
      <credentials>
         <credential name=hive_credentials' type='hcat'>
            <property>
               <name>hcat.metastore.uri</name>
               <value>HCAT_URI</value>
            </property>
            <property>
               <name>hcat.metastore.principal</name>
               <value>HCAT_PRINCIPAL</value>
            </property>
         </credential>
      </credentials>
      ...
      <action name='pig' cred='my-hcat-creds'>
         <pig>
            <job-tracker>JT</job-tracker>
            <name-node>NN</name-node>
            <configuration>
               <property>
                  <name>TESTING</name>
                  <value>${start}</value>
               </property>
            </configuration>
         </pig>
      </action>
      ...
   </workflow-app>

for Principal I used

hive/_HOST@MYDOMAIN.COM and Hcat_URI I configured with

hive.metastore.uris

In the action I have defined cred

<action name="actionAccountPartnerService" cred="hive_credentials">

Please let me know if you need more information

Thanks

Ram

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Expert Contributor

@Ramakrishna Pratapa

When using following workflow.xml for Sqoop with HCatalog, it works for me :

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="hive-wf">
    <credentials>
        <credential name='hive_auth' type='hcat'>
        <property>
            <name>hcat.metastore.uri</name>
            <value>thrift://ambari.example.com:9083</value> 
        </property>
        <property>
            <name>hcat.metastore.principal</name>
            <value>hive/_HOST@EXAMPLE.COM</value>
        </property>
        </credential>
    </credentials>

    <start to="import-sqoop"/>

    <action name='import-sqoop' cred="hive_auth">
     <sqoop xmlns='uri:oozie:sqoop-action:0.4'>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
        
        <arg>import</arg>
        <arg>--verbose</arg>
        <arg>--connect</arg>
	...
	...
        <file>/user/ambari-qa/oozie_sqoop/hive-site.xml</file>
      </sqoop>
    <ok to="end"/>
 <error to="fail"/>
 </action>

		

Can you please use this workflow.xml. Next time if you see any issue, please provide the HDP version as well.

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

Thank you Peeyush, I will test and let you know. The version of HDP is 2.5. I will post the results soon based on your workflow.xml.

Thanks

Ram

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

Also, I captured the logs using SQOOP job -exec command and executing sqoop action through oozie.

--- oozie sqoop action log----

4504 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1474494046050 4504 [main] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1474494046050 4556 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - WARNING: Use "yarn jar" to launch YARN applications. 4556 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - WARNING: Use "yarn jar" to launch YARN applications. 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Class path contains multiple SLF4J bindings. 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Class path contains multiple SLF4J bindings. 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Found binding in [jar:file:/hadoop/hadoop/yarn/local/filecache/171/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Found binding in [jar:file:/hadoop/hadoop/yarn/local/filecache/171/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 4840 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 4845 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 4845 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 9191 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - OK 9191 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - OK 9192 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Time taken: 2.205 seconds 9192 [Thread-19] INFO org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities - Time taken: 2.205 seconds 12733 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://caldev01:9083 12790 [main] WARN hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:380) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:230) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:3729) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:3715) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:462) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:244) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:185) at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.<init>(HiveClientCache.java:330) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

----SOOP job Execution Log---

16/09/22 18:12:09 INFO hcat.SqoopHCatUtilities: Executing external HCatalog CLI process with args :-f,/tmp/hcat-script-1474567929966 16/09/22 18:12:11 INFO hcat.SqoopHCatUtilities: SLF4J: Class path contains multiple SLF4J bindings. 16/09/22 18:12:11 INFO hcat.SqoopHCatUtilities: SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 16/09/22 18:12:11 INFO hcat.SqoopHCatUtilities: SLF4J: Found binding in [jar:file:/usr/hdp/2.5.0.0-1245/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class] 16/09/22 18:12:11 INFO hcat.SqoopHCatUtilities: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 16/09/22 18:12:11 INFO hcat.SqoopHCatUtilities: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/09/22 18:12:23 INFO hcat.SqoopHCatUtilities: OK 16/09/22 18:12:23 INFO hcat.SqoopHCatUtilities: Time taken: 5.928 seconds 16/09/22 18:12:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://caldev01:9083 16/09/22 18:12:27 INFO hive.metastore: Connected to metastore. 16/09/22 18:12:28 INFO hcat.SqoopHCatUtilities: HCatalog full table schema fields = [accountid, partnerid, serviceid, accountip, applicationid, tradingpartnerclientid, lastchangedby, lastchangeddate, messageversion] 16/09/22 18:12:28 INFO hcat.SqoopHCatUtilities: HCatalog table partitioning key fields = []

It is failing on connecting to meta store. I am not sure if some Jars are missing on any data node or because of Kerberos ticket.

Thanks

Ram

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

I ran the following command

hcat -e 'show databases'

on all nodes after change to the user that is starting the oozie job and after kinit. It is returing the results with all database list.

Please let me know if you need more information. I am not understanding why the exception is not clear.

Thanks

Ram

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

Good morning,

We tested with Hive-import (not hcat) with the following arguments in SQOOP

<arg>import</arg> <arg>--connect</arg> <arg>jdbc:sqlserver://mySQLServer;database=Mydatabase</arg> <arg>--table</arg> <arg>Account</arg> <arg>--username</arg> <arg>testuser</arg> <arg>--password</arg> <arg>mypassword</arg> <arg>--hive-import</arg> <arg>--hive-table</arg> <arg>Account_orc</arg>

it successfully created a table and imported the data. So, it appears there is an issue with hcat.

can any one help.

Thank you

Ram

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Expert Contributor

Thanks @Ramakrishna Pratapa for providing detailed logs.

I think this exception is happening due to mismatch in client server config. So please ensure that hive.metastore.sasl.enabled=true must be set in hive-site.xml for client and server.

Re: Oozie Sqoop action fails in kerberized cluster - HDP 2.5

Contributor

Both places I checked and it set as follows

<property> <name>hive.metastore.sasl.enabled</name> <value>true</value> </property>

Thanks

Ram