Created 08-08-2016 08:28 PM
I'm able to run a TDCH export from my HDP 2.3 cluster to Teradata using the following command:
hadoop jar ${USERLIBTDCH} com.teradata.hadoop.tool.TeradataExportTool \ -libjars ${LIB_JARS} \ -url ${FULL_TD_URL} \ -username ${TD_USER} \ -password ${TD_PW} \ -jobtype hive \ -fileformat orcfile \ -method batch.insert \ -nummappers 10 \ -sourcedatabase ${HIVE_DB} \ -sourcetable ${HIVE_TABLE} \ -sourcefieldnames "${TABLE_COLUMN_NAMES}" \ -stagedatabase ${TD_STAGING_DB} \ -errortabledatabase ${TD_STAGING_DB} \ -targettable ${TD_TABLE} \ -targetfieldnames "${TABLE_COLUMN_NAMES}"
Everything works fine when I run my script as the hive user. I'm switching the scripts over to use a service account but I get the following error when running the same script:
16/08/08 14:48:43 INFO tool.ConnectorExportTool: ConnectorExportTool starts at 1470685723042 16/08/08 14:48:43 INFO common.ConnectorPlugin: load plugins in file:/tmp/hadoop-unjar6402968921427571136/teradata.connector.plugins.xml 16/08/08 14:48:43 INFO hive.metastore: Trying to connect to metastore with URI thrift://our-fqdn:9083 16/08/08 14:48:44 INFO hive.metastore: Connected to metastore. 16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor starts at: 1470685724079 16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor ends at: 1470685724079 16/08/08 14:48:44 INFO processor.TeradataOutputProcessor: the total elapsed time of output postprocessor com.teradata.connector.teradata.processor.TeradataBatchInsertProcessor is: 0s 16/08/08 14:48:44 INFO tool.ConnectorExportTool: com.teradata.connector.common.exception.ConnectorException: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1218) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1204) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.tableExists(HiveMetaStoreClient.java:1274) at com.teradata.connector.hive.processor.HiveInputProcessor.inputPreProcessor(HiveInputProcessor.java:85) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:116) at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140) at com.teradata.connector.common.tool.ConnectorExportTool.run(ConnectorExportTool.java:62) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.teradata.hadoop.tool.TeradataExportTool.main(TeradataExportTool.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 16/08/08 14:48:44 INFO tool.ConnectorExportTool: job completed with exit code 10000
I figure this has to be some sort of permissions issue because it works as hive but not my service account. What other permissions should I check?
TDCH 1.4.1. Kerberized HDP 2.3 cluster.
Created 08-09-2016 07:13 PM
Found the issue in the hivemetastore.log:
2016-08-09 13:21:08,123 ERROR [pool-5-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message. java.lang.IllegalArgumentException: Illegal principal name serviceaccount@MY.REALM.EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.User.<init>(User.java:50) at org.apache.hadoop.security.User.<init>(User.java:43) at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1283) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:672) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) at org.apache.hadoop.security.User.<init>(User.java:48) ... 7 more
Turns out the Hive Metastore was missed in the list of services to be restarted after updating our realm rule mapping (hadoop.security.auth_to_local). TDCH is working fine now.
Created 08-09-2016 06:01 PM
I think what you need to do is allow Hive Impersonation. Hive will still run under user "hive" but it will impersonate the service user you create.
Created 08-09-2016 06:10 PM
@mqureshi We have hive.server2.enable.doAs set to false. I am expecting that if TDCH runs any hive queries they would run as the service account but the data in HDFS would be accessed by the hive user still. I don't see anything show up as denied in the Ranger audit log either.
Created 08-09-2016 06:17 PM
HiveServer2 runs as hive user and if hive.server2.enable.doAs is set to false then all queries are submitted as the user who is running HiveServer2. If you need to submit queries as your service user, then hive.server2.enable.doAs must be set to true. You will also need to do the following in your core-site.xml.
<property> <name>hadoop.proxyuser.hive.hosts</name> <value>host1,host2</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>group1,group2</value> </property>
Created 08-09-2016 06:42 PM
@mqureshi I want to leave hive.server2.enable.doAs set to false since we'll have other users accessing hive and need to keep the data in HDFS secure. I feel like my service account should have the ability to read from the hive metastore already.
Created 08-09-2016 06:42 PM
And that's why you would use Ranger. Other users who won't have access won't be able to because you can set them to not have permission through Ranger. Impersonation is not unique to Hadoop or Hive. This is how it's done and there are large financial institutions as well as health care organizations and other enterprises who are using Hadoop while being fully compliant with all laws and regulations.
you need to enable impersonation and then use Ranger to limit access to your service user.
Created 08-09-2016 07:13 PM
Found the issue in the hivemetastore.log:
2016-08-09 13:21:08,123 ERROR [pool-5-thread-199]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message. java.lang.IllegalArgumentException: Illegal principal name serviceaccount@MY.REALM.EXAMPLE.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.User.<init>(User.java:50) at org.apache.hadoop.security.User.<init>(User.java:43) at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1283) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:672) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to serviceaccount@MY.REALM.EXAMPLE.COM at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) at org.apache.hadoop.security.User.<init>(User.java:48) ... 7 more
Turns out the Hive Metastore was missed in the list of services to be restarted after updating our realm rule mapping (hadoop.security.auth_to_local). TDCH is working fine now.