Support Questions
Find answers, ask questions, and share your expertise

spark hbase connector on kerberized cluster

Rising Star

Hi,

I'm trying to execute pyspark code with SHC (spark hbase connector) to read data from hbase on a secured (kerberos) cluster.

Here is a simple example I can provide to illustrate :

# readExample.py
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"firsttable"},
    "rowkey":"key",
    "columns":{
        "firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
        "secondcol":{"cf":"d", "col":"colname", "type":"string"}
    }
}""".split())
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
df.select("secondcol").show()

In order to execute this properly, I successfully executed following command line :

spark-submit --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample.py

which I guess is striclty equivalent to :

spark-submit --master local[*] --deploy-mode client --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample1.py 

This is fine, but now I would like to execute the same on the cluster : I tried out following options, but both failed :

1 - client mode

spark-submit --master yarn --deploy-mode client --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample1.py

Driver execution hangs waiting for executors that fail to connect to HBase :

Logs from an executor :

17/10/23 16:28:10 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
	at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:642)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:166)

2 - cluster mode

spark-submit --master yarn --deploy-mode cluster --packages com.hortonworks:shc:1.0.0-1.6-s_2.10 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml --keytab=/path/to/my/keytab/myuser.keytab --principal=myuser@DOMAIN.CORP readExample1.py

Here, even the driver (that runs on the cluster) fails to connect to HBase :

Logs from the driver :

17/10/23 14:02:18 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
	at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:642)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:166)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:769)
	at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:766)

My question are rather simple :

- is it possible that my driver & executor successfully connect to HBase ?

- what should I do in addition to passing them my kerberos keytab/principal to make this work ?

Thanks for your help

1 REPLY 1

New Contributor

Did you solve this issue???