Created 03-15-2016 05:27 PM
Hello,
I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell.
I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job?
e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive:
// following works for HDFS, but not for Hive
System.setProperty("java.security.krb5.conf", krb5ConfPath);
final Configuration newConf = new Configuration();
newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP");
newConf.set(SERVER_KEYTAB_KEY, keyTabPath);
    LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************");
    org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com");
    LOG.info("Logged  in !!!    ******************* THIS REPLACE kinit **************************");
Thanks in advance.
UPDATE:
I have enabled lots of logging and tracked it down to the following differences in the log:
with kinit I get:
DEBUG	2016-03-16 11:12:09,557	6889	org.apache.hadoop.security.Groups	[main]	Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
>>> KrbCreds found the default ticket granting ticket in credential cache.
>>> Obtained TGT from LSA: Credentials:
      client=spark-Sandbox@KRB.HDP
      server=krbtgt/KRB.HDP@KRB.HDP
    authTime=20160316111142Z
     endTime=20160317111142Z
   renewTill=null
       flags=FORWARDABLE;INITIAL
EType (skey)=17
   (tkt key)=18
DEBUG	2016-03-16 11:12:09,560	6892	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login
DEBUG	2016-03-16 11:12:09,561	6893	org.apache.hadoop.security.UserGroupInformation	[main]	hadoop login commit
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	using kerberos user:spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP
DEBUG	2016-03-16 11:12:09,562	6894	org.apache.hadoop.security.UserGroupInformation	[main]	User entry: "spark-Sandbox@KRB.HDP"
DEBUG	2016-03-16 11:12:09,565	6897	org.apache.hadoop.security.UserGroupInformation	[main]	UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS)
DEBUG	2016-03-16 11:12:09,567	6899	org.apache.hadoop.security.UserGroupInformation	[TGT Renewer for spark-Sandbox@KRB.HDP]	Found tgt Ticket (hex) = 
whereas at the moment login with code (and NO kinit) got me these:
DEBUG 2016-03-16 11:09:58,902 7194 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 >>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 17 16 23. >>> Found no TGT's in LSA DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit DEBUG 2016-03-16 11:09:58,911 7203 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:null DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] using local user:NTUserPrincipal: davidtam DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] Using user: "NTUserPrincipal: davidtam" with name davidtam DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] User entry: "davidtam" DEBUG 2016-03-16 11:09:58,914 7206 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:davidtam (auth:KERBEROS) INFO 2016-03-16 11:09:58,931 7223 hive.metastore [main] Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 DEBUG 2016-03-16 11:09:58,963 7255 org.apache.hadoop.security.UserGroupInformation [main] PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) DEBUG 2016-03-16 11:09:58,963 7255 org.apache.thrift.transport.TSaslTransport [main] opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14 >>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 17 16 23. >>> Found no TGT's in LSA
I am running on windows connecting to the sandbox.
Created 03-16-2016 05:30 AM
Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.
--principal <principle name> --keytab /etc/security/keytabs/spark.keytab
Created 03-16-2016 05:30 AM
Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.
--principal <principle name> --keytab /etc/security/keytabs/spark.keytab
Created 03-16-2016 01:00 PM
@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox.
I am using spark 1.5.2 on HDP 2.3.4
Created 03-16-2016 01:30 PM
The same conf's should work for local mode also, initially it made for YARN only then later it applicable for local mode also. As I said earlier that it's better to try it on spark 1.6 version.
Please refer this Jira and it's Pull requests :- https://issues.apache.org/jira/browse/SPARK-11821
Created 03-16-2016 02:14 PM
@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then.
Thanks!
 
					
				
				
			
		
