<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Problems using hbase-spark on CDH in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359158#M238036</link>
    <description>&lt;P&gt;Nah I figure it out. First, go to&amp;nbsp;/etc/spark/conf.cloudera.spark_on_yarn/classpath.txt then delete the last line (which contains the path to hbase-class.jar). Then you download&amp;nbsp;hbase-spark-1.0.0.7.2.15.0-147.jar, then when you run spark-shell, add --jars pathToYourDownloadedjar, then you add&amp;nbsp;option("hbase.spark.pushdown.columnfilter", false) before load data like this:&lt;/P&gt;&lt;P&gt;val sql = spark.sqlContext&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val df = sql.read.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "name STRING :key, email STRING c:email, " + "birthDate STRING p:birthDate, height FLOAT p:height").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).option("hbase.spark.pushdown.columnfilter", false).load()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;df.createOrReplaceTempView("personView")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val results = sql.sql("SELECT * FROM personView where name = 'alice'")&lt;/P&gt;&lt;P&gt;results.show()&lt;/P&gt;</description>
    <pubDate>Fri, 09 Dec 2022 04:15:56 GMT</pubDate>
    <dc:creator>quangbilly79</dc:creator>
    <dc:date>2022-12-09T04:15:56Z</dc:date>
    <item>
      <title>Problems using hbase-spark on CDH</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/356333#M237257</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to use hbase-spark in order to query over Hbase with spark-sql, but I am stuck with some of these exceptions:&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.lang.NullPointerException&lt;/P&gt;&lt;P&gt;or&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.ByteStringer.wrap([B)Lcom/google/protobuf/ByteString;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Contents :&amp;nbsp;&lt;/P&gt;&lt;P&gt;1- Details on the platform&amp;nbsp;&lt;/P&gt;&lt;P&gt;2- Details on the problem&lt;/P&gt;&lt;P&gt;3- Description of the attached content&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1- A Cloudera cluster running CDH-6.3.4&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2- I run a (POC) Java application with spark-submit in yarn cluster mode. What the application does is to sequencially&amp;nbsp; :&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;- Create a Hbase table and populate it using the Java API&lt;/P&gt;&lt;P&gt;&amp;nbsp;- Use a spark-session to read the hbase table (using&amp;nbsp;.format("org.apache.hadoop.hbase.spark") )&lt;/P&gt;&lt;P&gt;&amp;nbsp;- Perform some queries on the Dataframe.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For now, I only got partial success on the last step. I can show the contents of the dataframe with this for example :&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Dataset&amp;lt;Row&amp;gt; sqlDF1 = sqlContext.sql(&lt;SPAN&gt;"SELECT * FROM census2"&lt;/SPAN&gt;)&lt;SPAN&gt;;&lt;BR /&gt;&lt;/SPAN&gt;sqlDF1.show(&lt;SPAN&gt;100&lt;/SPAN&gt;&lt;SPAN&gt;, false&lt;/SPAN&gt;)&lt;SPAN&gt;;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;But the following code fails with one of the two exceptions listed at the start of the post :&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Dataset&amp;lt;Row&amp;gt; sqlDF2 = sqlContext.sql(&lt;SPAN&gt;"SELECT * FROM census2 WHERE ID1 LIKE '____|001_|%'"&lt;/SPAN&gt;)&lt;SPAN&gt;;&lt;BR /&gt;&lt;/SPAN&gt;sqlDF2.show(&lt;SPAN&gt;100&lt;/SPAN&gt;&lt;SPAN&gt;, false&lt;/SPAN&gt;)&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Concerning this exception :&amp;nbsp;&lt;/P&gt;&lt;P&gt;java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.ByteStringer.wrap([B)Lcom/google/protobuf/ByteString;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I see that the class is provided by the hbase-protocol project, as can be seen here :&amp;nbsp;&lt;A href="https://github.com/apache/hbase/blob/rel/2.1.0/hbase-protocol/src/main/java/org/apache/hadoop/hbase/util/ByteStringer.java" target="_blank" rel="noopener"&gt;hbase/ByteStringer.java at rel/2.1.0 · apache/hbase · GitHub&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I included the jar with the --jar option in the spark-submit, and it is also present in the uber-jar that is launched. So I don't see why I get this error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to use both hbase-spark 2.1.0-cdh6.3.4 maven-central dependency as well as a hbase-spark library that I compiled myself, but it did not help.&lt;/P&gt;&lt;P&gt;I also tried to add this to my sparksession :&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;//                .config("spark.jars", "hbase-spark-1.0.0.jar:hbase-protocol-2.1.0.jar")&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;But then I get NullPointerException and cannot even print the dataframes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also tried to add :&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;.option(&lt;SPAN&gt;"hbase.spark.use.hbasecontext"&lt;/SPAN&gt;&lt;SPAN&gt;, false&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;P&gt;when reading the dataframe (as I found someone suggesting that) but it did not help either.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;3- Description of the attached content&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Main.java.txt =&amp;gt; the code of the sample application&lt;/P&gt;&lt;P&gt;- launcher.sh.txt =&amp;gt; the bash code used to launch the application&lt;/P&gt;&lt;P&gt;- jars_and_classpaths.txt =&amp;gt; the jars passed to the --jars command, as well as the java client classpath&lt;/P&gt;&lt;P&gt;- mvn_dependency_tree.txt =&amp;gt; the results of the command mvn dependency:tree&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am stuck here, could someone help me ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks a lot&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Oct 2022 16:42:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/356333#M237257</guid>
      <dc:creator>Jean-Luc</dc:creator>
      <dc:date>2022-10-27T16:42:47Z</dc:date>
    </item>
    <item>
      <title>Re: Problems using hbase-spark on CDH</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/356347#M237260</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/101433"&gt;@Jean-Luc&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can try the following example code&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/rangareddy/ranga_spark_experiments/tree/master/spark_hbase_cdh_integration" target="_blank"&gt;https://github.com/rangareddy/ranga_spark_experiments/tree/master/spark_hbase_cdh_integration&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2022 03:21:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/356347#M237260</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2022-10-28T03:21:06Z</dc:date>
    </item>
    <item>
      <title>Re: Problems using hbase-spark on CDH</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/356367#M237268</link>
      <description>&lt;P&gt;Hi RangaReddy,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I looked into your solution, however I see that the only actions you perform to test the hbase-spark interaction is this :&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;employeeDf.printSchema()&lt;BR /&gt;employeeDf.show(truncate=false)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But I already have success on both these actions.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I want to do, is to perform "advanced" sql operations on the dataframe. Namely, filter on the rowkeys and qualifier values.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example :&amp;nbsp;sqlContext.sql(&lt;SPAN&gt;"SELECT * FROM census2 WHERE ID1 LIKE '____|001_|%'"&lt;/SPAN&gt;)&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Did you try to do that in your experiment ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I found this on the branch master of hbase-connectors/spark (&lt;A href="https://github.com/apache/hbase-connectors/tree/master/spark" target="_blank" rel="noopener"&gt;hbase-connectors/spark at master · apache/hbase-connectors · GitHub&lt;/A&gt;) :&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Server-side&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(HBase region servers) configuration:&lt;/P&gt;&lt;P&gt;The following jars need to be in the CLASSPATH of the HBase region servers:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;scala-library, hbase-spark, and hbase-spark-protocol-shaded.&lt;/LI&gt;&lt;LI&gt;The server-side configuration is needed for column filter pushdown&lt;/LI&gt;&lt;LI&gt;if you cannot perform the server-side configuration, consider using&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;.option("hbase.spark.pushdown.columnfilter", false)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So the --jars option of the spark-submit does make the jars accessible to the spark driver and executors, but somehow when you make a qualifier filter operation, spark must be delegating some work to the hbase region servers, and the jars need to be in the classpath of the region servers's Java processes too ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2022 08:24:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/356367#M237268</guid>
      <dc:creator>Jean-Luc</dc:creator>
      <dc:date>2022-10-28T08:24:52Z</dc:date>
    </item>
    <item>
      <title>Re: Problems using hbase-spark on CDH</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359111#M238020</link>
      <description>&lt;P&gt;Did you solve this?&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 08:12:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359111#M238020</guid>
      <dc:creator>quangbilly79</dc:creator>
      <dc:date>2022-12-08T08:12:09Z</dc:date>
    </item>
    <item>
      <title>Re: Problems using hbase-spark on CDH</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359158#M238036</link>
      <description>&lt;P&gt;Nah I figure it out. First, go to&amp;nbsp;/etc/spark/conf.cloudera.spark_on_yarn/classpath.txt then delete the last line (which contains the path to hbase-class.jar). Then you download&amp;nbsp;hbase-spark-1.0.0.7.2.15.0-147.jar, then when you run spark-shell, add --jars pathToYourDownloadedjar, then you add&amp;nbsp;option("hbase.spark.pushdown.columnfilter", false) before load data like this:&lt;/P&gt;&lt;P&gt;val sql = spark.sqlContext&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val df = sql.read.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "name STRING :key, email STRING c:email, " + "birthDate STRING p:birthDate, height FLOAT p:height").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).option("hbase.spark.pushdown.columnfilter", false).load()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;df.createOrReplaceTempView("personView")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;val results = sql.sql("SELECT * FROM personView where name = 'alice'")&lt;/P&gt;&lt;P&gt;results.show()&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2022 04:15:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359158#M238036</guid>
      <dc:creator>quangbilly79</dc:creator>
      <dc:date>2022-12-09T04:15:56Z</dc:date>
    </item>
    <item>
      <title>Re: Problems using hbase-spark on CDH</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359159#M238037</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/102287"&gt;@quangbilly79&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You have used CDP&amp;nbsp;&lt;SPAN&gt;hbase-spark-1.0.0.7.2.15.0-147.jar instead of CDH. There is no guarantee it will work latest jar in CDH. Luckily for you it is worked.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2022 04:30:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Problems-using-hbase-spark-on-CDH/m-p/359159#M238037</guid>
      <dc:creator>RangaReddy</dc:creator>
      <dc:date>2022-12-09T04:30:39Z</dc:date>
    </item>
  </channel>
</rss>

