Reply
Expert Contributor
Posts: 74
Registered: ‎11-24-2017

Spark SQL action fails in Kerberos secured cluster

[ Edited ]

Hello everyone

 

I need to run a Spark-SQL action on an Hive table. I am having problems on authentication (the cluster is Kerberos-secured). 

I've tried first with hive2 credentials because they work with my other hive2 actions, my I got a failure (I suppose this type of credentials can only be used with hive2 actions?):

 

2018-04-06 08:37:21,831 [Driver] INFO  org.apache.hadoop.hive.ql.session.SessionState  - No Tez session required at this point. hive.execution.engine=mr.
2018-04-06 08:37:22,117 [Driver] INFO  hive.metastore  - Trying to connect to metastore with URI thrift://trmas-fc2d552a.azcloud.local:9083
2018-04-06 08:37:22,153 [Driver] ERROR org.apache.thrift.transport.TSaslTransport  - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
	at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
[...]

I've also tried with hcat credentials, but with this one I got a START_RETRY state of the actin with the following error:

 

JA009: org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : TException while getting delegation token.. Cause : org.apache.thrift.transport.TTransportException

This is the workflow.xml:

 

<workflow-app
    xmlns="uri:oozie:workflow:0.5" name="oozie_spark_wf">
    <credentials>
        <credential name="hive2_credentials" type="hive2">
            <property>
                <name>hive2.jdbc.url</name>
                <value>jdbc:hive2://trmas-fc2d552a.azcloud.local:10000/default;ssl=true</value>
            </property>
            <property>
                <name>hive2.server.principal</name>
                <value>hive/trmas-fc2d552a.azcloud.local@AZCLOUD.LOCAL</value>
            </property>
        </credential>
        <credential name="hcat_cred" type="hcat">
            <property>
                <name>hcat.metastore.uri</name>
                <value>thrift://trmas-fc2d552a.azcloud.local:9083</value>
            </property>
            <property>
                <name>hcat.metastore.principal</name>
                <value>hive/trmas-fc2d552a.azcloud.local@AZCLOUD.LOCAL</value>
            </property>
        </credential>
    </credentials>
    <start to="spark_action"/>
    <action cred="hcat_cred" name="spark_action">
        <spark
            xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/icon0104/output"/>
            </prepare>
            <master>yarn-cluster</master>
            <mode>cluster</mode>
            <name>OozieSpark</name>
            <class>my.Main</class>
            <jar>/home/icon0104/oozie/ooziespark/lib/ooziespark-1.0.jar</jar>
            <spark-opts>--files ${nameNode}/user/icon0104/oozie/ooziespark/hive-site.xml</spark-opts>
        </spark>
        <ok to="END_NODE"/>
        <error to="KILL_NODE"/>
    </action>
    <kill name="KILL_NODE">
        <message>${wf:errorMessage(wf:lastErrorNode())}</message>
    </kill>
    <end name="END_NODE"/>
</workflow-app>

 

This is the hive-site.xml:

 

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://trmas-fc2d552a.azcloud.local:9083</value>
  </property>
  <property>
    <name>hive.metastore.client.socket.timeout</name>
    <value>300</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.auto.convert.join</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.auto.convert.join.noconditionaltask.size</name>
    <value>20971520</value>
  </property>
  <property>
    <name>hive.optimize.bucketmapjoin.sortedmerge</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.smbjoin.cache.rows</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.server2.logging.operation.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/var/log/hive/operation_logs</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>-1</value>
  </property>
  <property>
    <name>hive.exec.reducers.bytes.per.reducer</name>
    <value>67108864</value>
  </property>
  <property>
    <name>hive.exec.copyfile.maxsize</name>
    <value>33554432</value>
  </property>
  <property>
    <name>hive.exec.reducers.max</name>
    <value>1099</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.checkinterval</name>
    <value>4096</value>
  </property>
  <property>
    <name>hive.vectorized.groupby.flush.percent</name>
    <value>0.1</value>
  </property>
  <property>
    <name>hive.compute.query.using.stats</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.vectorized.execution.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.vectorized.execution.reduce.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.merge.mapfiles</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.merge.mapredfiles</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.cbo.enable</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.fetch.task.conversion</name>
    <value>minimal</value>
  </property>
  <property>
    <name>hive.fetch.task.conversion.threshold</name>
    <value>268435456</value>
  </property>
  <property>
    <name>hive.limit.pushdown.memory.usage</name>
    <value>0.1</value>
  </property>
  <property>
    <name>hive.merge.sparkfiles</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.merge.smallfiles.avgsize</name>
    <value>16777216</value>
  </property>
  <property>
    <name>hive.merge.size.per.task</name>
    <value>268435456</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.optimize.reducededuplication.min.reducer</name>
    <value>4</value>
  </property>
  <property>
    <name>hive.map.aggr</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.map.aggr.hash.percentmemory</name>
    <value>0.5</value>
  </property>
  <property>
    <name>hive.optimize.sort.dynamic.partition</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.execution.engine</name>
    <value>mr</value>
  </property>
  <property>
    <name>spark.executor.memory</name>
    <value>268435456</value>
  </property>
  <property>
    <name>spark.driver.memory</name>
    <value>268435456</value>
  </property>
  <property>
    <name>spark.executor.cores</name>
    <value>4</value>
  </property>
  <property>
    <name>spark.yarn.driver.memoryOverhead</name>
    <value>26</value>
  </property>
  <property>
    <name>spark.yarn.executor.memoryOverhead</name>
    <value>26</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.initialExecutors</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.minExecutors</name>
    <value>1</value>
  </property>
  <property>
    <name>spark.dynamicAllocation.maxExecutors</name>
    <value>2147483647</value>
  </property>
  <property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.support.concurrency</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.zookeeper.quorum</name>
    <value>trmas-6b8bc78c.azcloud.local,trmas-c9471d78.azcloud.local,trmas-fc2d552a.azcloud.local</value>
  </property>
  <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
  </property>
  <property>
    <name>hive.zookeeper.namespace</name>
    <value>hive_zookeeper_namespace_CD-HIVE-LTqXUcrR</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>trmas-6b8bc78c.azcloud.local,trmas-c9471d78.azcloud.local,trmas-fc2d552a.azcloud.local</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
  <property>
    <name>hive.cluster.delegation.token.store.class</name>
    <value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
  <property>
    <name>hive.metastore.sasl.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.authentication</name>
    <value>kerberos</value>
  </property>
  <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>hive/_HOST@AZCLOUD.LOCAL</value>
  </property>
  <property>
    <name>hive.server2.authentication.kerberos.principal</name>
    <value>hive/_HOST@AZCLOUD.LOCAL</value>
  </property>
  <property>
    <name>hive.server2.use.SSL</name>
    <value>true</value>
  </property>
  <property>
    <name>spark.shuffle.service.enabled</name>
    <value>true</value>
  </property>
</configuration>

 

 

In Oozie configuration I have the following credentials classes enabled:

 

hcat=org.apache.oozie.action.hadoop.HCatCredentials,hbase=org.apache.oozie.action.hadoop.HbaseCredentials,hive2=org.apache.oozie.action.hadoop.Hive2Credentials 

 

Can anyone help? What am I missing?

 

 

Posts: 519
Topics: 14
Kudos: 90
Solutions: 45
Registered: ‎09-02-2016

Re: Spark SQL action fails in Kerberos secured cluster

@ludof

 

all you have to do is, run the kinit command and give the kerberos password before you start your spark session and continue with your steps, it will be fixed

Expert Contributor
Posts: 74
Registered: ‎11-24-2017

Re: Spark SQL action fails in Kerberos secured cluster

Hi @saranvisa, thanks for the answer.

 

Do I need to do this every time before running the Oozie Spark action? Because this is a coordinator-scheduled workflow that I need to run several times per day.

Explorer
Posts: 10
Registered: ‎11-11-2014

Re: Spark SQL action fails in Kerberos secured cluster

another option would be to 1. Create a keytab file 2. include kinit with keytab as part of the user profile so that everytime the job runs it will kinit automatically and obtain a valid ticket
Posts: 519
Topics: 14
Kudos: 90
Solutions: 45
Registered: ‎09-02-2016

Re: Spark SQL action fails in Kerberos secured cluster

@ludof

 

no need to do it everytime, because in general once you done kinit, it will be valid for 24 hours (you can customize if you want), so do it once a day manually or you can automate it in some scenarios using cron jobs

ex: you have jobs round the clock, more than one users are using the same user/batchid for a project, etc

Expert Contributor
Posts: 74
Registered: ‎11-24-2017

Re: Spark SQL action fails in Kerberos secured cluster

@saranvisa @suresh.sethu

 

I've tried to kinit before launching Oozie Spark action in yarn-cluster mode but it fails anyway.

In the logs I found a lot of the following warnings:

 

2018-04-16 10:59:05,874 [main] INFO  org.apache.spark.deploy.yarn.YarnSparkHadoopUtil  - getting token for namenode: hdfs://hanameservice/user/icon0104/.sparkStaging/application_1523441517429_3067
2018-04-16 10:59:06,004 [main] WARN  org.apache.hadoop.security.UserGroupInformation  - PriviledgedActionException as:icon0104 (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error

and the following exception:

 

diagnostics: User class threw exception: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

 

I've also tried to run the spark program directly from the shell with spark-submit using --master yarn-cluster but got the following error:

 

Delegation Token can be issued only with kerberos or web authentication

 

Any idea how to solve?

 

 

 

 

Posts: 519
Topics: 14
Kudos: 90
Solutions: 45
Registered: ‎09-02-2016

Re: Spark SQL action fails in Kerberos secured cluster

@ludof

 

Pls try to follow the input from the below link (read all the comment till end by pressing show more).. a similar issue has been discussed here.. it may help you

 

https://stackoverflow.com/questions/44376334/how-to-fix-delegation-token-can-be-issued-only-with-ker...

 

 

Expert Contributor
Posts: 74
Registered: ‎11-24-2017

Re: Spark SQL action fails in Kerberos secured cluster

@saranvisa

 

Thank you, if I uderstand correctly I need to provide a keytab file on HDFS and pass it as a file in the Oozie Spark action. What I am missing here is how can generate this keytab file as non proviliged user. I can kinit but I have no privileges for kadmin command. Do I need to contact an administrator or are other ways to get this keytab file?

Highlighted
Posts: 519
Topics: 14
Kudos: 90
Solutions: 45
Registered: ‎09-02-2016

Re: Spark SQL action fails in Kerberos secured cluster

@ludof

 

yes, in general developers will not have access to create a keytab.. you have to contact your admin for the same (mostly admin should have permission to create the one for you, but there are some organization with  a dedicated security team to handle LDAP, AD, Kerberos, etc.. it depends upon your organization, but you have to start with your admin)

Explorer
Posts: 10
Registered: ‎11-11-2014

Re: Spark SQL action fails in Kerberos secured cluster

@ludof

 

From the log below, hope the connection is getting established with simple auth and not Kerberos Auth. 

 

PriviledgedActionException as:icon0104 (auth:SIMPLE) 

 

Announcements