Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark can't connect to HBase using Kerberos in Cluster mode

Solved Go to solution

Re: Spark can't connect to HBase using Kerberos in Cluster mode

You got the same error message?

Re: Spark can't connect to HBase using Kerberos in Cluster mode

New Contributor

Yes, I got the same kerberos credential error that I posted above forloginUserFromKeytab()

When I shipped files, the error changed slightly to: can't get password from the keytab.

Re: Spark can't connect to HBase using Kerberos in Cluster mode

Contributor

In addition to Josh's recommendations, the configuration details in this KB article are also relevant to setting up Spark-to-HBase connectivity in a secure environment.

Highlighted

Re: Spark can't connect to HBase using Kerberos in Cluster mode

Expert Contributor

First of all, which Spark version are you using. Apache Spark 2.0 has support for automatically acquiring HBase security tokens correctly for that job and all its executors. Apache Spark 1.6 does not have that feature but in HDP Spark 1.6 we have backported that feature and it can acquire the HBase tokens for the jobs. The tokens are automatically acquired if 1) security is enabled and 2) hbase-site.xml is present on the client classpath 3) that hbase-site.xml has kerberos security configured. Then hbase tokens for the hbase master specified in that hbase-site.xml are acquired and used in the job.

In order to obtain the tokens spark client needs to use hbase code and so specific hbase jars need to be present in the client classpath. This is documented in here on the SHC github page. Search for "secure" on that page.

To access hbase inside the spark jobs the job obviously needs hbase jars to be present for the driver and/or executors. That would be part of your existing job submission for non-secure clusters, which I assume already works.

If this job is going to be long running and run beyond the token expire time (typically 7 days) then you need to submit the Spark job with the --keytab and --principal option such that Spark can use that keytab to re-acquire tokens before the current ones expire.

Re: Spark can't connect to HBase using Kerberos in Cluster mode

New Contributor

Hi Bikas, If I want to use HbaseConnection directly to access hbase, would Apache Spark 2.2 refresh token for me? If yes, how to get connection object, just call ConnectionFactory.createConnection(conf) ?

Re: Spark can't connect to HBase using Kerberos in Cluster mode

Contributor

Hi Josh, Should it also work when we use the function saveAsNewAPIHadoopDataset over a rdd of "JavaPairRDD<ImmutableBytesWritable, Put>"? I tried with an without the doas and I was not able to make it work. I don't get any errors just nothing happen. Any idea? Thanks, Michel

Re: Spark can't connect to HBase using Kerberos in Cluster mode

New Contributor

I have tried both approaches, and end up getting the same error message ->

Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user

The same keytab file works just fine when attempting interactive login to hbase. Also, the same code works just fine when i submit the job with "local[*]" as master instead of yarn.

Any pointers ?

Re: Spark can't connect to HBase using Kerberos in Cluster mode

New Contributor

in hbase-site.xml hbase.coprocessor.region.classes should contain also

org.apache.hadoop.hbase.security.token.TokenProvider

Don't have an account?
Coming from Hortonworks? Activate your account here