Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Accessing Hive from spark without using kinit

avatar
Super Collaborator

Hello,

I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell.

I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job?

e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive:

// following works for HDFS, but not for Hive
System.setProperty("java.security.krb5.conf", krb5ConfPath);
final Configuration newConf = new Configuration();
newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP");
newConf.set(SERVER_KEYTAB_KEY, keyTabPath);
    LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************");
    org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com");
    LOG.info("Logged  in !!!    ******************* THIS REPLACE kinit **************************");

Thanks in advance.

UPDATE:

I have enabled lots of logging and tracked it down to the following differences in the log:

with kinit I get:

DEBUG	2016-03-16 11:12:09,557	6889	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>> KrbCreds found the default ticket granting ticket in credential cache.
>>> Obtained TGT from LSA: Credentials:
      client=spark-Sandbox@KRB.HDP
      server=krbtgt/KRB.HDP@KRB.HDP
    authTime=20160316111142Z
     endTime=20160317111142Z
   renewTill=null
       flags=FORWARDABLE;INITIAL
EType (skey)=17
   (tkt key)=18
DEBUG	2016-03-16 11:12:09,560	6892	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:12:09,561	6893	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "spark-Sandbox@KRB.HDP"
DEBUG	2016-03-16 11:12:09,565	6897	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS)
DEBUG	2016-03-16 11:12:09,567	6899	org.apache.hadoop.security.UserGroupInformation	[TGT Renewer for spark-Sandbox@KRB.HDP]	Found tgt Ticket (hex) = 

whereas at the moment login with code (and NO kinit) got me these:

DEBUG	2016-03-16 11:09:58,902	7194	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA
DEBUG	2016-03-16 11:09:58,910	7202	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:09:58,910	7202	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:09:58,911	7203	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:null
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	using local user:NTUserPrincipal: davidtam
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "NTUserPrincipal: davidtam" with name davidtam
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "davidtam"
DEBUG	2016-03-16 11:09:58,914	7206	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:davidtam (auth:KERBEROS)
INFO	2016-03-16 11:09:58,931	7223	hive.metastore	[main]	Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
DEBUG	2016-03-16 11:09:58,963	7255	org.apache.hadoop.security.UserGroupInformation	[main]	PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
DEBUG	2016-03-16 11:09:58,963	7255	org.apache.thrift.transport.TSaslTransport	[main]	opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA

I am running on windows connecting to the sandbox.

1 ACCEPTED SOLUTION

avatar
Super Guru

Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.

--principal <principle name> --keytab /etc/security/keytabs/spark.keytab

View solution in original post

4 REPLIES 4

avatar
Super Guru

Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.

--principal <principle name> --keytab /etc/security/keytabs/spark.keytab

avatar
Super Collaborator

@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox.

I am using spark 1.5.2 on HDP 2.3.4

avatar
Super Guru

@David Tam

The same conf's should work for local mode also, initially it made for YARN only then later it applicable for local mode also. As I said earlier that it's better to try it on spark 1.6 version.

Please refer this Jira and it's Pull requests :- https://issues.apache.org/jira/browse/SPARK-11821

avatar
Super Collaborator

@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then.

Thanks!