I have a Kerberized and HTTPS enabled HDP 2.5 cluster, and we are trying to run a Hadoop Hive task from SSIS. I understand SSIS uses WebHCat to run Hive queries. A sample CSV file is uploaded into HDFS, a Hive table created separately, trying to insert data into this Hive table using a simple Inline script option below:
‘load data INPATH <HDFS Filename> OVERWRITE into table <table_name>’
When I execute the SSIS package, I get the error in webhcat.log and hivemetastore.log as below:
Caused by: MetaException(message:User: HTTP/_HOST@REALM is not allowed to impersonate <username>) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
From various other online references I updated, the following proxy settings in the core-site.xml
hadoop.proxyuser.HTTP.groups=* (Note: Originally this field had ‘users’, changed to *) hadoop.proxyuser.HTTP.hosts=*
In webhcat-site.xml, I have the following as well:
webhcat.proxyuser.hcat.groups=* webhcat.proxyuser.hcat.hosts=* webhcat.proxyuser.HTTP.hosts=* webhcat.proxyuser.HTTP.groups=*
I am getting the same error from the webhcat.log, with the curl command:
curl -i --negotiate -u : 'http://<web-hcat-server-name>:50111/templeton/v1/ddl/database/default'
In my SSIS Hadoop connection manager, I have WebHcat Connection enabled, selected Kerberos as Authentication and gave my AD username and password. And the Test Connection verification from Hadoop Connection Manager to WebHcat says succeded, but fails when running the Hadoop Hive Task.