08-06-2017 11:24 AM
I want to access HBase inside my spark code.
For accessing the HBase inside the spark i have used UserGroupInformation
Configuration conf = HBaseConfiguration.create();
I am unable to access the HBase from spark code. It is getting below error and finally, job is failed after the
709d40c03cb, negotiated timeout = 60000
17/08/06 18:10:01 WARN spark.SparkContext: Killing executors is only supported in coarse-grained mode
17/08/06 18:10:01 WARN spark.ExecutorAllocationManager: Unable to reach the cluster manager to kill executor driver!
17/08/06 18:10:36 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=48580 ms ago, cancelled=false, msg=
But if I try to access HBase alone (without spark code ) using simple java program, I am able to access HBase in the kerborized cluster.
I am using CDH 5.8 with Spark 1.6.
I am executing the spark job by passing principal and keytab and inside spark code , I used UserGroupInformation for HBase access.
spark-submit --keytab /home/centos/spark_on_yarn.keytab --principal spark/ip-xxxxxx.us-west-2.xx.xxxx@xxxxxxx.COM --class SparkReadAndPrint --deploy-mode client --master local /home/centos/newspark.jar /user/centos/bank.txt
I tried by passing hbase-site.xml as file parameter also.
spark-submit --keytab /home/centos/spark_on_yarn.keytab --principal spark/ip-xxxxxx.us-west-2.xx.xxxx@xxxxxxx.COM --files "hbase-site.xml,hbase.keytab,hdfs-site.xml,hdfs.keytab" --class SparkReadAndPrint --deploy-mode client --master local /home/centos/newspark.jar /user/centos/bank.txt
Even checked the /etc/spark/ configuration and hbase-site.xml is found in the config path.
it would be great if you can answer below queries
a) What is the issue regarding above approach?
b) Is there any other way to access HBase from Spark code?
c) I am using the principal and keytab generated by Cloudera Manager for both spark and hbase access.
Is there any other approach by creating keytab with both these principal and try to access?
d) Please share the best approach if you want to access multiple services like spark,hbase, kafka in a kerberoized cluster? Do we need to create a single keytab and principal for accessing these services?
It would be great if you can clarify above points. Thanks in Advance.