Member since
10-27-2022
2
Posts
0
Kudos Received
0
Solutions
10-28-2022
01:24 AM
Hi RangaReddy, I looked into your solution, however I see that the only actions you perform to test the hbase-spark interaction is this : employeeDf.printSchema() employeeDf.show(truncate=false) But I already have success on both these actions. What I want to do, is to perform "advanced" sql operations on the dataframe. Namely, filter on the rowkeys and qualifier values. For example : sqlContext.sql("SELECT * FROM census2 WHERE ID1 LIKE '____|001_|%'"); Did you try to do that in your experiment ? I found this on the branch master of hbase-connectors/spark (hbase-connectors/spark at master · apache/hbase-connectors · GitHub) : Server-side (HBase region servers) configuration: The following jars need to be in the CLASSPATH of the HBase region servers: scala-library, hbase-spark, and hbase-spark-protocol-shaded. The server-side configuration is needed for column filter pushdown if you cannot perform the server-side configuration, consider using .option("hbase.spark.pushdown.columnfilter", false) So the --jars option of the spark-submit does make the jars accessible to the spark driver and executors, but somehow when you make a qualifier filter operation, spark must be delegating some work to the hbase region servers, and the jars need to be in the classpath of the region servers's Java processes too ? Thanks,
... View more
10-27-2022
09:42 AM
Hello, I am trying to use hbase-spark in order to query over Hbase with spark-sql, but I am stuck with some of these exceptions: java.lang.NullPointerException or java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.ByteStringer.wrap([B)Lcom/google/protobuf/ByteString; Contents : 1- Details on the platform 2- Details on the problem 3- Description of the attached content 1- A Cloudera cluster running CDH-6.3.4 2- I run a (POC) Java application with spark-submit in yarn cluster mode. What the application does is to sequencially : - Create a Hbase table and populate it using the Java API - Use a spark-session to read the hbase table (using .format("org.apache.hadoop.hbase.spark") ) - Perform some queries on the Dataframe. For now, I only got partial success on the last step. I can show the contents of the dataframe with this for example : Dataset<Row> sqlDF1 = sqlContext.sql("SELECT * FROM census2"); sqlDF1.show(100, false); But the following code fails with one of the two exceptions listed at the start of the post : Dataset<Row> sqlDF2 = sqlContext.sql("SELECT * FROM census2 WHERE ID1 LIKE '____|001_|%'"); sqlDF2.show(100, false); Concerning this exception : java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.ByteStringer.wrap([B)Lcom/google/protobuf/ByteString; I see that the class is provided by the hbase-protocol project, as can be seen here : hbase/ByteStringer.java at rel/2.1.0 · apache/hbase · GitHub I included the jar with the --jar option in the spark-submit, and it is also present in the uber-jar that is launched. So I don't see why I get this error. I tried to use both hbase-spark 2.1.0-cdh6.3.4 maven-central dependency as well as a hbase-spark library that I compiled myself, but it did not help. I also tried to add this to my sparksession : // .config("spark.jars", "hbase-spark-1.0.0.jar:hbase-protocol-2.1.0.jar") But then I get NullPointerException and cannot even print the dataframes. I also tried to add : .option("hbase.spark.use.hbasecontext", false) when reading the dataframe (as I found someone suggesting that) but it did not help either. 3- Description of the attached content - Main.java.txt => the code of the sample application - launcher.sh.txt => the bash code used to launch the application - jars_and_classpaths.txt => the jars passed to the --jars command, as well as the java client classpath - mvn_dependency_tree.txt => the results of the command mvn dependency:tree I am stuck here, could someone help me ? Thanks a lot
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
-
Apache Spark