Support Questions
Find answers, ask questions, and share your expertise

SSIS Hive Task using WebHCAT

Explorer

I have a Kerberized and HTTPS enabled HDP 2.5 cluster, and we are trying to run a Hadoop Hive task from SSIS. I understand SSIS uses WebHCat to run Hive queries. A sample CSV file is uploaded into HDFS, a Hive table created separately, trying to insert data into this Hive table using a simple Inline script option below:

‘load data INPATH <HDFS Filename> OVERWRITE into table <table_name>’

When I execute the SSIS package, I get the error in webhcat.log and hivemetastore.log as below:

Caused by: MetaException(message:User: HTTP/_HOST@REALM is not allowed to impersonate <username>)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_delegation_token_result$get_delegation_token_resultStandardScheme.read(ThriftHiveMetastore.java)

From various other online references I updated, the following proxy settings in the core-site.xml

hadoop.proxyuser.HTTP.groups=*  (Note: Originally this field had ‘users’, changed to *) 
hadoop.proxyuser.HTTP.hosts=*

In webhcat-site.xml, I have the following as well:

webhcat.proxyuser.hcat.groups=*
webhcat.proxyuser.hcat.hosts=*
webhcat.proxyuser.HTTP.hosts=*
webhcat.proxyuser.HTTP.groups=*

I am getting the same error from the webhcat.log, with the curl command:

curl -i --negotiate -u : 'http://<web-hcat-server-name>:50111/templeton/v1/ddl/database/default'

In my SSIS Hadoop connection manager, I have WebHcat Connection enabled, selected Kerberos as Authentication and gave my AD username and password. And the Test Connection verification from Hadoop Connection Manager to WebHcat says succeded, but fails when running the Hadoop Hive Task.

2 REPLIES 2

Explorer

Any one else faced the same issue with WebHcat trying to write to Hive Tables on a kerberized cluster?

Contributor

@anrathen 

 

Can you adjust hive proxy settings and test once

hadoop.proxyuser.hive.hosts=*
hadoop.proxyuser.hive.groups=*

 

; ;