Created 03-15-2016 05:27 PM
Hello,
I need to run spark (1.5.2) job in a kerberoized environment (I am currently testing on HDP 2.3.4 sandbox). The job needs to be able to read and write to hive (I am using HiveContext). Also I am using master = local[*], which is similar to spark-shell.
I am able to do this in spark by running kinit beforehand. However is there any other way to authenticate programatically within the spark job?
e.g. I am about to read / write in kerberos hdfs by running the following before the spark code, without kinit. Is there something similar I can do for hive:
// following works for HDFS, but not for Hive System.setProperty("java.security.krb5.conf", krb5ConfPath); final Configuration newConf = new Configuration(); newConf.set(SERVER_PRINCIPAL_KEY, "spark-Sandbox@KRB.HDP"); newConf.set(SERVER_KEYTAB_KEY, keyTabPath); LOG.info("Logging in now... ******************* THIS REPLACE kinit **************************"); org.apache.hadoop.security.SecurityUtil.login(newConf, SERVER_KEYTAB_KEY, SERVER_PRINCIPAL_KEY, "sandbox.hortonworks.com"); LOG.info("Logged in !!! ******************* THIS REPLACE kinit **************************");
Thanks in advance.
UPDATE:
I have enabled lots of logging and tracked it down to the following differences in the log:
with kinit I get:
DEBUG 2016-03-16 11:12:09,557 6889 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 >>> KrbCreds found the default ticket granting ticket in credential cache. >>> Obtained TGT from LSA: Credentials: client=spark-Sandbox@KRB.HDP server=krbtgt/KRB.HDP@KRB.HDP authTime=20160316111142Z endTime=20160317111142Z renewTill=null flags=FORWARDABLE;INITIAL EType (skey)=17 (tkt key)=18 DEBUG 2016-03-16 11:12:09,560 6892 org.apache.hadoop.security.UserGroupInformation [main] hadoop login DEBUG 2016-03-16 11:12:09,561 6893 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:spark-Sandbox@KRB.HDP DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] Using user: "spark-Sandbox@KRB.HDP" with name spark-Sandbox@KRB.HDP DEBUG 2016-03-16 11:12:09,562 6894 org.apache.hadoop.security.UserGroupInformation [main] User entry: "spark-Sandbox@KRB.HDP" DEBUG 2016-03-16 11:12:09,565 6897 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:spark-Sandbox@KRB.HDP (auth:KERBEROS) DEBUG 2016-03-16 11:12:09,567 6899 org.apache.hadoop.security.UserGroupInformation [TGT Renewer for spark-Sandbox@KRB.HDP] Found tgt Ticket (hex) =
whereas at the moment login with code (and NO kinit) got me these:
DEBUG 2016-03-16 11:09:58,902 7194 org.apache.hadoop.security.Groups [main] Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 >>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 17 16 23. >>> Found no TGT's in LSA DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login DEBUG 2016-03-16 11:09:58,910 7202 org.apache.hadoop.security.UserGroupInformation [main] hadoop login commit DEBUG 2016-03-16 11:09:58,911 7203 org.apache.hadoop.security.UserGroupInformation [main] using kerberos user:null DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] using local user:NTUserPrincipal: davidtam DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] Using user: "NTUserPrincipal: davidtam" with name davidtam DEBUG 2016-03-16 11:09:58,912 7204 org.apache.hadoop.security.UserGroupInformation [main] User entry: "davidtam" DEBUG 2016-03-16 11:09:58,914 7206 org.apache.hadoop.security.UserGroupInformation [main] UGI loginUser:davidtam (auth:KERBEROS) INFO 2016-03-16 11:09:58,931 7223 hive.metastore [main] Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 DEBUG 2016-03-16 11:09:58,963 7255 org.apache.hadoop.security.UserGroupInformation [main] PrivilegedAction as:c009003 (auth:KERBEROS) from:org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) DEBUG 2016-03-16 11:09:58,963 7255 org.apache.thrift.transport.TSaslTransport [main] opening transport org.apache.thrift.transport.TSaslClientTransport@7c206b14 >>>KinitOptions cache name is C:\Users\davidtam\krb5cc_davidtam >> Acquire default native Credentials Using builtin default etypes for default_tkt_enctypes default etypes for default_tkt_enctypes: 17 16 23. >>> Found no TGT's in LSA
I am running on windows connecting to the sandbox.
Created 03-16-2016 05:30 AM
Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.
--principal <principle name> --keytab /etc/security/keytabs/spark.keytab
Created 03-16-2016 05:30 AM
Did you tired to use below properties as command line parameter while running spark-submit? Also there were few issues related to spark kerberos in spark 1.4 and 1.5 so its better to try this on spark 1.6 release.
--principal <principle name> --keytab /etc/security/keytabs/spark.keytab
Created 03-16-2016 01:00 PM
@Jitendra Yadav thanks for your reply. I believe these are for yarn while I am trying to run master = local[*], similar to spark-shell on sandbox.
I am using spark 1.5.2 on HDP 2.3.4
Created 03-16-2016 01:30 PM
The same conf's should work for local mode also, initially it made for YARN only then later it applicable for local mode also. As I said earlier that it's better to try it on spark 1.6 version.
Please refer this Jira and it's Pull requests :- https://issues.apache.org/jira/browse/SPARK-11821
Created 03-16-2016 02:14 PM
@Jitendra Yadav thanks just had a look at the jira. I think in this case I will need to wait until we upgrade to spark 1.6 then.
Thanks!