Created 03-01-2017 03:16 PM
Hi,
We are on HDP 2.4.2, with spark 1.6 compiled with Scala 2.10.5. Hbase is version 1.1.2.2.4.2.0-258
The environment is a basic dev cluster (<10 nodes) with hbase & spark running in cluster mode.
Attempts to use spark hbase connector for getting soem data from hbase into a data frame in spark are failing with the following error -
Exception in thread "main" java.lang.UnsupportedOperationException: empty.tail at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445) at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45) at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123) at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:150) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:164) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:239) at hbaseReaderHDPCon$.main(hbaseReaderHDPCon.scala:42) at hbaseReaderHDPCon.main(hbaseReaderHDPCon.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
At line 42 in my code - this is happening -
val cat = s"""{ |"table":{"namespace":"myTable", "name":"person", "tableCoder":"PrimitiveType"}, |"rowkey":"ROW", |"columns":{ |"col0":{"cf":"person", "col":"detail", "type":"string"} |} |}""".stripMargin val scon = new SparkConf() val sparkContext = new SparkContext(scon) sparkContext.hadoopConfiguration.set("spark.hbase.host", "myZoookeeper Quorum information") val sqlContext = new SQLContext(sparkContext) val m = HBaseTableCatalog(Map(HBaseTableCatalog.tableCatalog -> cat)) def withCatalog(cat: String): DataFrame = { sqlContext .read .options(Map(HBaseTableCatalog.tableCatalog->cat)) .format("org.apache.spark.sql.execution.datasources.hbase") .load() }
Created 03-13-2017 04:00 PM
Here("col0":{"cf":"person", "col":"detail", "type":"string"}) you were missing the rowkey details.
For example: see here, id column is pointed to HBASE rowkey
"id":{"cf":"rowkey", "col":"key", "type":"string"},