Support Questions
Find answers, ask questions, and share your expertise

Error reading/writing from spark to hbase (nullpointerexception in HbaseRelation/createRelation)

Highlighted

Error reading/writing from spark to hbase (nullpointerexception in HbaseRelation/createRelation)

Contributor

I have an issue while verifying that I can connect to hbase from Spark (from spark-shell so through scala).
scala version: 2.11.7

spark version: 2.2.1

hbase: 1.3.1
I followed this example: https://docs.cloudera.com/runtime/7.1.1/managing-hbase/topics/hbase-example-using-hbase-spark-connec... . I had to load quite some extra hbase-related jars to get all commands in the example above to work (hbase-client/hbase-common/hbase-spark).
Only at the end when I read/write I get an error. This is the full example with output:

 

scala> import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.Path
scala> import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HBaseConfiguration
scala> val conf = HBaseConfiguration.create()
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml
scala> conf.addResource(new Path("/home/jb/Documents/cloud-local/hbase-1.3.1/conf/hbase-site.xml"))
scala> import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.hadoop.hbase.spark.HBaseContext
scala> new HBaseContext(sc, conf) // "sc" is the SparkContext you created earlier.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.exceptions.DeserializationException not found - continuing with a stub.
error: Class org.apache.hadoop.hbase.exceptions.DeserializationException not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.protobuf.generated.ClientProtos not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
scala> val sql = spark.sqlContext
sql: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@76e00bdb
scala> import java.sql.Date
import java.sql.Date
scala> case class Person(name: String,
     |                  email: String,
     |                  birthDate: java.sql.Date,
     |                  height: Float)
defined class Person
scala> var personDS = Seq(
     |  Person("alice", "alice@alice.com", Date.valueOf("2000-01-01"), 4.5f),
     |  Person("bob", "bob@bob.com", Date.valueOf("2001-10-17"), 5.1f)
     | ).toDS
personDS: org.apache.spark.sql.Dataset[Person] = [name: string, email: string ... 2 more fields]
scala> personDS.write.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping","name STRING :key, email STRING c:email,birthDate DATE p:birthDate, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.save()
name STRING :key, email STRING c:email,birthDate DATE p:birthDate, height FLOAT p:height
java.lang.NullPointerException
  at org.apache.hadoop.hbase.spark.HBaseRelation.<init>(DefaultSource.scala:139)
  at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
  ... 48 elided

 


Same as in this post (https://stackoverflow.com/questions/52372245/hbase-spark-load-data-raise-nullpointerexception-error-...), so already included this to ensure the details of hbase-site.xml are included, but that does not fix it.

This is the createRelation method in DefaultSource:

 

public BaseRelation createRelation(final SQLContext sqlContext, final SaveMode mode, final Map<String, String> parameters, final Dataset<Row> data) {
final HBaseRelation relation = new HBaseRelation((Map)parameters, (Option)new Some((Object)data.schema()), sqlContext);
relation.createTable();
relation.insert((Dataset)data, false);
return (BaseRelation)relation;
}

 

I guess one of the input params must be null but can't figure out which one.

 

Input welcome!

 



Don't have an account?