Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Accessing Hive from spark without using kinit

Solved Go to solution

Accessing Hive from spark without using kinit

Expert Contributor

Hello,

I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell.

I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job?

e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive:

// following works for HDFS, but not for Hive
System.setProperty("java.security.krb5.conf", krb5ConfPath);
final Configuration newConf = new Configuration();
newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP");
newConf.set(SERVER_KEYTAB_KEY, keyTabPath);
    LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************");
    org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com");
    LOG.info("Logged  in !!!    ******************* THIS REPLACE kinit **************************");

Thanks in advance.

UPDATE:

I have enabled lots of logging and tracked it down to the following differences in the log:

with kinit I get:

DEBUG	2016-03-16 11:12:09,557	6889	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>> KrbCreds found the default ticket granting ticket in credential cache.
>>> Obtained TGT from LSA: Credentials:
      client=spark-Sandbox@KRB.HDP
      server=krbtgt/KRB.HDP@KRB.HDP
    authTime=20160316111142Z
     endTime=20160317111142Z
   renewTill=null
       flags=FORWARDABLE;INITIAL
EType (skey)=17
   (tkt key)=18
DEBUG	2016-03-16 11:12:09,560	6892	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:12:09,561	6893	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "spark-Sandbox@KRB.HDP"
DEBUG	2016-03-16 11:12:09,565	6897	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS)
DEBUG	2016-03-16 11:12:09,567	6899	org.apache.hadoop.security.UserGroupInformation	[TGT Renewer for spark-Sandbox@KRB.HDP]	Found tgt Ticket (hex) = 

whereas at the moment login with code (and NO kinit) got me these:

DEBUG	2016-03-16 11:09:58,902	7194	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA
DEBUG	2016-03-16 11:09:58,910	7202	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:09:58,910	7202	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:09:58,911	7203	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:null
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	using local user:NTUserPrincipal: davidtam
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "NTUserPrincipal: davidtam" with name davidtam
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "davidtam"
DEBUG	2016-03-16 11:09:58,914	7206	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:davidtam (auth:KERBEROS)
INFO	2016-03-16 11:09:58,931	7223	hive.metastore	[main]	Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
DEBUG	2016-03-16 11:09:58,963	7255	org.apache.hadoop.security.UserGroupInformation	[main]	PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
DEBUG	2016-03-16 11:09:58,963	7255	org.apache.thrift.transport.TSaslTransport	[main]	opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA

I am running on windows connecting to the sandbox.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Accessing Hive from spark without using kinit

Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.

--principal <principle name> --keytab /etc/security/keytabs/spark.keytab

4 REPLIES 4

Re: Accessing Hive from spark without using kinit

Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.

--principal <principle name> --keytab /etc/security/keytabs/spark.keytab

Re: Accessing Hive from spark without using kinit

Expert Contributor

@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox.

I am using spark 1.5.2 on HDP 2.3.4

Re: Accessing Hive from spark without using kinit

@David Tam

The same conf's should work for local mode also, initially it made for YARN only then later it applicable for local mode also. As I said earlier that it's better to try it on spark 1.6 version.

Please refer this Jira and it's Pull requests :- https://issues.apache.org/jira/browse/SPARK-11821

Re: Accessing Hive from spark without using kinit

Expert Contributor

@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then.

Thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here