Created 08-21-2020 09:21 AM
I have an issue while verifying that I can connect to hbase from Spark (from spark-shell so through scala).
scala version: 2.11.7
spark version: 2.2.1
hbase: 1.3.1
I followed this example: https://docs.cloudera.com/runtime/7.1.1/managing-hbase/topics/hbase-example-using-hbase-spark-connec... . I had to load quite some extra hbase-related jars to get all commands in the example above to work (hbase-client/hbase-common/hbase-spark).
Only at the end when I read/write I get an error. This is the full example with output:
scala> import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.Path
scala> import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HBaseConfiguration
scala> val conf = HBaseConfiguration.create()
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
conf: org.apache.hadoop.conf.Configuration = Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml
scala> conf.addResource(new Path("/home/jb/Documents/cloud-local/hbase-1.3.1/conf/hbase-site.xml"))
scala> import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.hadoop.hbase.spark.HBaseContext
scala> new HBaseContext(sc, conf) // "sc" is the SparkContext you created earlier.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.exceptions.DeserializationException not found - continuing with a stub.
error: Class org.apache.hadoop.hbase.exceptions.DeserializationException not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.protobuf.generated.ClientProtos not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceAudience not found - continuing with a stub.
warning: Class org.apache.hadoop.hbase.classification.InterfaceStability not found - continuing with a stub.
scala> val sql = spark.sqlContext
sql: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@76e00bdb
scala> import java.sql.Date
import java.sql.Date
scala> case class Person(name: String,
| email: String,
| birthDate: java.sql.Date,
| height: Float)
defined class Person
scala> var personDS = Seq(
| Person("alice", "alice@alice.com", Date.valueOf("2000-01-01"), 4.5f),
| Person("bob", "bob@bob.com", Date.valueOf("2001-10-17"), 5.1f)
| ).toDS
personDS: org.apache.spark.sql.Dataset[Person] = [name: string, email: string ... 2 more fields]
scala> personDS.write.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping","name STRING :key, email STRING c:email,birthDate DATE p:birthDate, height FLOAT p:height")
.option("hbase.table", "person")
.option("hbase.spark.use.hbasecontext", false)
.save()
name STRING :key, email STRING c:email,birthDate DATE p:birthDate, height FLOAT p:height
java.lang.NullPointerException
at org.apache.hadoop.hbase.spark.HBaseRelation.<init>(DefaultSource.scala:139)
at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:469)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
... 48 elided
Same as in this post (https://stackoverflow.com/questions/52372245/hbase-spark-load-data-raise-nullpointerexception-error-...), so already included this to ensure the details of hbase-site.xml are included, but that does not fix it.
This is the createRelation method in DefaultSource:
public BaseRelation createRelation(final SQLContext sqlContext, final SaveMode mode, final Map<String, String> parameters, final Dataset<Row> data) {
final HBaseRelation relation = new HBaseRelation((Map)parameters, (Option)new Some((Object)data.schema()), sqlContext);
relation.createTable();
relation.insert((Dataset)data, false);
return (BaseRelation)relation;
}
I guess one of the input params must be null but can't figure out which one.
Input welcome!