Support Questions

Find answers, ask questions, and share your expertise

Accessing Hive from spark without using kinit

avatar
Super Collaborator

Hello,

I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell.

I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job?

e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive:

// following works for HDFS, but not for Hive
System.setProperty("java.security.krb5.conf", krb5ConfPath);
final Configuration newConf = new Configuration();
newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP");
newConf.set(SERVER_KEYTAB_KEY, keyTabPath);
    LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************");
    org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com");
    LOG.info("Logged  in !!!    ******************* THIS REPLACE kinit **************************");

Thanks in advance.

UPDATE:

I have enabled lots of logging and tracked it down to the following differences in the log:

with kinit I get:

DEBUG	2016-03-16 11:12:09,557	6889	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>> KrbCreds found the default ticket granting ticket in credential cache.
>>> Obtained TGT from LSA: Credentials:
      client=spark-Sandbox@KRB.HDP
      server=krbtgt/KRB.HDP@KRB.HDP
    authTime=20160316111142Z
     endTime=20160317111142Z
   renewTill=null
       flags=FORWARDABLE;INITIAL
EType (skey)=17
   (tkt key)=18
DEBUG	2016-03-16 11:12:09,560	6892	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:12:09,561	6893	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "spark-Sandbox@KRB.HDP"
DEBUG	2016-03-16 11:12:09,565	6897	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS)
DEBUG	2016-03-16 11:12:09,567	6899	org.apache.hadoop.security.UserGroupInformation	[TGT Renewer for spark-Sandbox@KRB.HDP]	Found tgt Ticket (hex) = 

whereas at the moment login with code (and NO kinit) got me these:

DEBUG	2016-03-16 11:09:58,902	7194	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA
DEBUG	2016-03-16 11:09:58,910	7202	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:09:58,910	7202	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:09:58,911	7203	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:null
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	using local user:NTUserPrincipal: davidtam
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "NTUserPrincipal: davidtam" with name davidtam
DEBUG	2016-03-16 11:09:58,912	7204	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "davidtam"
DEBUG	2016-03-16 11:09:58,914	7206	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:davidtam (auth:KERBEROS)
INFO	2016-03-16 11:09:58,931	7223	hive.metastore	[main]	Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
DEBUG	2016-03-16 11:09:58,963	7255	org.apache.hadoop.security.UserGroupInformation	[main]	PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
DEBUG	2016-03-16 11:09:58,963	7255	org.apache.thrift.transport.TSaslTransport	[main]	opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14
>>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam
>> Acquire default native Credentials
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23.
>>> Found no TGT's in LSA

I am running on windows connecting to the sandbox.

1 ACCEPTED SOLUTION

avatar
Super Guru

Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.

--principal <principle name> --keytab /etc/security/keytabs/spark.keytab

View solution in original post

4 REPLIES 4

avatar
Super Guru

Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.

--principal <principle name> --keytab /etc/security/keytabs/spark.keytab

avatar
Super Collaborator

@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox.

I am using spark 1.5.2 on HDP 2.3.4

avatar
Super Guru

@David Tam

The same conf's should work for local mode also, initially it made for YARN only then later it applicable for local mode also. As I said earlier that it's better to try it on spark 1.6 version.

Please refer this Jira and it's Pull requests :- https://issues.apache.org/jira/browse/SPARK-11821

avatar
Super Collaborator

@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then.

Thanks!