<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Running Spark job to query Hive HBase tables in a Kerberized cluster in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190373#M59033</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17157/sudeep-shekharmishra.html" nodeid="17157"&gt;@Sudeep Mishra&lt;/A&gt;, In secure env, you need to add all Hbase dependent jars to SPARK CLASSPATH. Add this configuration to spark-env.sh.&lt;/P&gt;&lt;P&gt;export SPARK_CLASSPATH=&amp;lt;List of Hbase jars separated by &lt;span class="lia-unicode-emoji" title=":grinning_squinting_face:"&gt;😆&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 12 May 2017 03:27:11 GMT</pubDate>
    <dc:creator>yvora</dc:creator>
    <dc:date>2017-05-12T03:27:11Z</dc:date>
    <item>
      <title>Running Spark job to query Hive HBase tables in a Kerberized cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190372#M59032</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am trying to run a &lt;STRONG&gt;Spark 1.6 job (written in Java)&lt;/STRONG&gt; on &lt;STRONG&gt;Kerberized &lt;/STRONG&gt;cluster.&lt;/P&gt;&lt;P&gt;Through the job I am  trying to read data from a Hive table which uses HBase for its storage.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;SparkConf conf = new SparkConf();
&lt;P&gt;JavaSparkContext context = new JavaSparkContext(conf);&lt;/P&gt;&lt;P&gt;HiveContext hiveContext = new HiveContext(context.sc()); &lt;/P&gt;&lt;P&gt;hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict"); hiveContext.setConf("spark.sql.hive.convertMetastoreOrc", "false"); &lt;/P&gt;&lt;P&gt;hiveContext.setConf("spark.sql.caseSensitive","false");&lt;/P&gt;&lt;P&gt;DataFrame df = hiveContext.sql(task.getQuery());&lt;/P&gt;&lt;P&gt;df.show(100);&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;I am using below &lt;STRONG&gt;spark-sumbit &lt;/STRONG&gt;command to run the job on YARN:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;spark-submit --master yarn --deploy-mode cluster --class &amp;lt;Main class name&amp;gt; --num-executors 2 --executor-cores 1 --executor-memory 1g --driver-memory 1g --jars application.json,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/etc/hbase/conf/hbase-site.xml,/usr/hdp/current/spark-client/conf/hive-site.xml data-refiner-1.0.jar&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;I have already performed a &lt;STRONG&gt;kinit &lt;/STRONG&gt;before running the job.&lt;/P&gt;&lt;P&gt;The job is able to communicate with Hive meta-store and parse the query &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;17/04/05 06:15:23 INFO ParseDriver: Parsing command: SELECT * FROM &amp;lt;db_name&amp;gt;.&amp;lt;table_name&amp;gt;&lt;/P&gt;17/04/05 06:15:24 INFO ParseDriver: Parse Completed&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;But when trying to communicate with HBase to get data it is failing with below exception:&lt;/P&gt;&lt;P&gt;17/04/05 06:15:26 WARN AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
17/04/05 06:15:26 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1199)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:32765)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1627)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:104)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:94)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:107)
at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:73)
at org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
at org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86)
at org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:111)
at org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:108)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313)
at org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:108)
at org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(TokenUtil.java:329)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.addHBaseDelegationToken(HBaseStorageHandler.java:496)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:441)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:342)
at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:304)
at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:323)
at org.apache.spark.sql.hive.HadoopTableReader$anonfun$12.apply(TableReader.scala:276)
at org.apache.spark.sql.hive.HadoopTableReader$anonfun$12.apply(TableReader.scala:276)
at org.apache.spark.rdd.HadoopRDD$anonfun$getJobConf$6.apply(HadoopRDD.scala:174)
at org.apache.spark.rdd.HadoopRDD$anonfun$getJobConf$6.apply(HadoopRDD.scala:174)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:174)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190)
at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$anonfun$org$apache$spark$sql$DataFrame$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.DataFrame$anonfun$org$apache$spark$sql$DataFrame$execute$1$1.apply(DataFrame.scala:1499)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$execute$1(DataFrame.scala:1498)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$collect(DataFrame.scala:1505)
at org.apache.spark.sql.DataFrame$anonfun$head$1.apply(DataFrame.scala:1375)
at org.apache.spark.sql.DataFrame$anonfun$head$1.apply(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:350)
at org.apache.spark.sql.DataFrame.show(DataFrame.scala:311)
at com.hpe.eap.batch.EAPDataRefinerMain.main(EAPDataRefinerMain.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) &lt;/P&gt;&lt;P&gt;The job runs fine when we query a normal Hive table and also on non-kerberized cluster.&lt;/P&gt;&lt;P&gt;Kindly suggest if we need to modify any configuration parameter/code changes to resolve the issue.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 16:08:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190372#M59032</guid>
      <dc:creator>sudeep-shekhar_</dc:creator>
      <dc:date>2017-04-06T16:08:07Z</dc:date>
    </item>
    <item>
      <title>Re: Running Spark job to query Hive HBase tables in a Kerberized cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190373#M59033</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17157/sudeep-shekharmishra.html" nodeid="17157"&gt;@Sudeep Mishra&lt;/A&gt;, In secure env, you need to add all Hbase dependent jars to SPARK CLASSPATH. Add this configuration to spark-env.sh.&lt;/P&gt;&lt;P&gt;export SPARK_CLASSPATH=&amp;lt;List of Hbase jars separated by &lt;span class="lia-unicode-emoji" title=":grinning_squinting_face:"&gt;😆&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 12 May 2017 03:27:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190373#M59033</guid>
      <dc:creator>yvora</dc:creator>
      <dc:date>2017-05-12T03:27:11Z</dc:date>
    </item>
    <item>
      <title>Re: Running Spark job to query Hive HBase tables in a Kerberized cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190374#M59034</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17157/sudeep-shekharmishra.html" nodeid="17157"&gt;@Sudeep Mishra&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please pass the user keytab in along with spark-submit command.&lt;/P&gt;&lt;PRE&gt;--files /&amp;lt;key_tab_location&amp;gt;/&amp;lt;user_keytab.keytab&amp;gt;&lt;/PRE&gt;&lt;P&gt;This is due to the executors are not authenticated to extract the data from HBase Region servers or any other components.&lt;/P&gt;&lt;P&gt;by passing the keytab all the executors will have the key-tab and able to communicate &lt;/P&gt;</description>
      <pubDate>Wed, 17 May 2017 05:11:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190374#M59034</guid>
      <dc:creator>bkosaraju</dc:creator>
      <dc:date>2017-05-17T05:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: Running Spark job to query Hive HBase tables in a Kerberized cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190375#M59035</link>
      <description>&lt;P&gt;Hi, were you able to resolve this issue. Im also getting the same error and done all the above approach but it is still not resolved. And im only getting error in cluser mode, local mode working fine. &lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 19:43:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190375#M59035</guid>
      <dc:creator>pbabbar3</dc:creator>
      <dc:date>2018-03-19T19:43:18Z</dc:date>
    </item>
    <item>
      <title>Re: Running Spark job to query Hive HBase tables in a Kerberized cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190376#M59036</link>
      <description>&lt;P&gt;On Spark2, while we were unable to get it working by setting this in the SPARK_CLASSPATH, the fix did work by passing the same set of jars in the executor classpath.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Sep 2018 21:30:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Running-Spark-job-to-query-Hive-HBase-tables-in-a-Kerberized/m-p/190376#M59036</guid>
      <dc:creator>sapank</dc:creator>
      <dc:date>2018-09-06T21:30:25Z</dc:date>
    </item>
  </channel>
</rss>

