Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

TDCH: TTransportException when running as user other than hive

avatar
Expert Contributor

I'm able to run a TDCH export from my HDP 2.3 cluster to Teradata using the following command:

hadoop jar ${USERLIBTDCH} com.teradata.hadoop.tool.TeradataExportTool \
-libjars ${LIB_JARS} \
-url ${FULL_TD_URL} \
-username ${TD_USER} \
-password ${TD_PW} \
-jobtype hive \
-fileformat orcfile \
-method batch.insert \
-nummappers 10 \
-sourcedatabase ${HIVE_DB} \
-sourcetable ${HIVE_TABLE} \
-sourcefieldnames "${TABLE_COLUMN_NAMES}" \
-stagedatabase ${TD_STAGING_DB} \
-errortabledatabase ${TD_STAGING_DB} \
-targettable ${TD_TABLE} \
-targetfieldnames "${TABLE_COLUMN_NAMES}"

Everything works fine when I run my script as the hive user. I'm switching the scripts over to use a service account but I get the following error when running the same script:

16/08/08 14:48:43 INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1470685723042
16/08/08 14:48:43 INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-unjar6402968921427571136/teradata.connector.plugins.xml
16/08/08 14:48:43 INFO hive.metastore: Trying to connect to metastore with URI thrift://our-fqdn:9083
16/08/08 14:48:44 INFO hive.metastore: Connected to metastore.
16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor starts at:  1470685724079
16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor ends at:  1470685724079
16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor is: 0s
16/08/08 14:48:44 INFO tool.ConnectorExportTool: com.teradata.connector.common.exception.ConnectorException: org.apache.thrift.transport.TTransportException
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
        at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
        at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
        at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
        at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1218)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1204)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1274)
        at com.teradata.connector.hive.processor.HiveInputProcessor.inputPreProcessor(HiveInputProcessor.java:85)
        at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:116)
        at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


        at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140)
        at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


16/08/08 14:48:44 INFO tool.ConnectorExportTool: job completed with exit code 10000


I figure this has to be some sort of permissions issue because it works as hive but not my service account. What other permissions should I check?

TDCH 1.4.1. Kerberized HDP 2.3 cluster.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Found the issue in the hivemetastore.log:

2016-08-09 13:21:08,123 ERROR [pool-5-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message.
java.lang.IllegalArgumentException: Illegal principal name serviceaccount@MY.REALM.EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM
        at org.apache.hadoop.security.User.<init>(User.java:50)
        at org.apache.hadoop.security.User.<init>(User.java:43)
        at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1283)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:672)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM
        at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
        at org.apache.hadoop.security.User.<init>(User.java:48)
        ... 7 more

Turns out the Hive Metastore was missed in the list of services to be restarted after updating our realm rule mapping (hadoop.security.auth_to_local). TDCH is working fine now.

View solution in original post

6 REPLIES 6

avatar
Super Guru
@Kit Menke

I think what you need to do is allow Hive Impersonation. Hive will still run under user "hive" but it will impersonate the service user you create.

avatar
Expert Contributor

@mqureshi We have hive.server2.enable.doAs set to false. I am expecting that if TDCH runs any hive queries they would run as the service account but the data in HDFS would be accessed by the hive user still. I don't see anything show up as denied in the Ranger audit log either.

avatar
Super Guru
@Kit Menke

HiveServer2 runs as hive user and if hive.server2.enable.doAs is set to false then all queries are submitted as the user who is running HiveServer2. If you need to submit queries as your service user, then hive.server2.enable.doAs must be set to true. You will also need to do the following in your core-site.xml.

<property>
     <name>hadoop.proxyuser.hive.hosts</name>
     <value>host1,host2</value>
   </property>
   <property>
     <name>hadoop.proxyuser.hive.groups</name>
     <value>group1,group2</value>
   </property>

avatar
Expert Contributor

@mqureshi I want to leave hive.server2.enable.doAs set to false since we'll have other users accessing hive and need to keep the data in HDFS secure. I feel like my service account should have the ability to read from the hive metastore already.

avatar
Super Guru

And that's why you would use Ranger. Other users who won't have access won't be able to because you can set them to not have permission through Ranger. Impersonation is not unique to Hadoop or Hive. This is how it's done and there are large financial institutions as well as health care organizations and other enterprises who are using Hadoop while being fully compliant with all laws and regulations.

you need to enable impersonation and then use Ranger to limit access to your service user.

avatar
Expert Contributor

Found the issue in the hivemetastore.log:

2016-08-09 13:21:08,123 ERROR [pool-5-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message.
java.lang.IllegalArgumentException: Illegal principal name serviceaccount@MY.REALM.EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM
        at org.apache.hadoop.security.User.<init>(User.java:50)
        at org.apache.hadoop.security.User.<init>(User.java:43)
        at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1283)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:672)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM
        at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
        at org.apache.hadoop.security.User.<init>(User.java:48)
        ... 7 more

Turns out the Hive Metastore was missed in the list of services to be restarted after updating our realm rule mapping (hadoop.security.auth_to_local). TDCH is working fine now.