- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
TDCH: TTransportException when running as user other than hive
- Labels:
-
Apache Hive
Created ‎08-08-2016 08:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm able to run a TDCH export from my HDP 2.3 cluster to Teradata using the following command:
hadoop jar ${USERLIBTDCH} com.teradata.hadoop.tool.TeradataExportTool \ -libjars ${LIB_JARS} \ -url ${FULL_TD_URL} \ -username ${TD_USER} \ -password ${TD_PW} \ -jobtype hive \ -fileformat orcfile \ -method batch.insert \ -nummappers 10 \ -sourcedatabase ${HIVE_DB} \ -sourcetable ${HIVE_TABLE} \ -sourcefieldnames "${TABLE_COLUMN_NAMES}" \ -stagedatabase ${TD_STAGING_DB} \ -errortabledatabase ${TD_STAGING_DB} \ -targettable ${TD_TABLE} \ -targetfieldnames "${TABLE_COLUMN_NAMES}"
Everything works fine when I run my script as the hive user. I'm switching the scripts over to use a service account but I get the following error when running the same script:
16/08/08 14:48:43 INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1470685723042 16/08/08 14:48:43 INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-unjar6402968921427571136/teradata.connector.plugins.xml 16/08/08 14:48:43 INFO hive.metastore: Trying to connect to metastore with URI thrift://our-fqdn:9083 16/08/08 14:48:44 INFO hive.metastore: Connected to metastore. 16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor starts at: 1470685724079 16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor ends at: 1470685724079 16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor is: 0s 16/08/08 14:48:44 INFO tool.ConnectorExportTool: com.teradata.connector.common.exception.ConnectorException: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1218) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1204) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1274) at com.teradata.connector.hive.processor.HiveInputProcessor.inputPreProcessor(HiveInputProcessor.java:85) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:116) at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140) at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 16/08/08 14:48:44 INFO tool.ConnectorExportTool: job completed with exit code 10000
I figure this has to be some sort of permissions issue because it works as hive but not my service account. What other permissions should I check?
TDCH 1.4.1. Kerberized HDP 2.3 cluster.
Created ‎08-09-2016 07:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found the issue in the hivemetastore.log:
2016-08-09 13:21:08,123 ERROR [pool-5-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message. java.lang.IllegalArgumentException: Illegal principal name serviceaccount@MY.REALM.EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.User.<init>(User.java:50) at org.apache.hadoop.security.User.<init>(User.java:43) at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1283) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:672) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) at org.apache.hadoop.security.User.<init>(User.java:48) ... 7 more
Turns out the Hive Metastore was missed in the list of services to be restarted after updating our realm rule mapping (hadoop.security.auth_to_local). TDCH is working fine now.
Created ‎08-09-2016 06:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think what you need to do is allow Hive Impersonation. Hive will still run under user "hive" but it will impersonate the service user you create.
Created ‎08-09-2016 06:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi We have hive.server2.enable.doAs set to false. I am expecting that if TDCH runs any hive queries they would run as the service account but the data in HDFS would be accessed by the hive user still. I don't see anything show up as denied in the Ranger audit log either.
Created ‎08-09-2016 06:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HiveServer2 runs as hive user and if hive.server2.enable.doAs is set to false then all queries are submitted as the user who is running HiveServer2. If you need to submit queries as your service user, then hive.server2.enable.doAs must be set to true. You will also need to do the following in your core-site.xml.
<property> <name>hadoop.proxyuser.hive.hosts</name> <value>host1,host2</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>group1,group2</value> </property>
Created ‎08-09-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi I want to leave hive.server2.enable.doAs set to false since we'll have other users accessing hive and need to keep the data in HDFS secure. I feel like my service account should have the ability to read from the hive metastore already.
Created ‎08-09-2016 06:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And that's why you would use Ranger. Other users who won't have access won't be able to because you can set them to not have permission through Ranger. Impersonation is not unique to Hadoop or Hive. This is how it's done and there are large financial institutions as well as health care organizations and other enterprises who are using Hadoop while being fully compliant with all laws and regulations.
you need to enable impersonation and then use Ranger to limit access to your service user.
Created ‎08-09-2016 07:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found the issue in the hivemetastore.log:
2016-08-09 13:21:08,123 ERROR [pool-5-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message. java.lang.IllegalArgumentException: Illegal principal name serviceaccount@MY.REALM.EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.User.<init>(User.java:50) at org.apache.hadoop.security.User.<init>(User.java:43) at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1283) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:672) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) at org.apache.hadoop.security.User.<init>(User.java:48) ... 7 more
Turns out the Hive Metastore was missed in the list of services to be restarted after updating our realm rule mapping (hadoop.security.auth_to_local). TDCH is working fine now.
