Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Exception while using Spark HBase Connector on HDP2.6

avatar
Super Collaborator

Hi Guys,

I am using Spark 1.6.3 and HBase is 1.1.2 on hdp2.6. I have to use Spark 1.6, cannot go to Spark 2. The connector jar is shc-1.0.0-1.6-s_2.10.jar. I am writing to hbase table from the pyspark dataframe:

cat = json.dumps({"table":{"namespace":"dsc", "name":"table1", "tableCoder":"PrimitiveType"},"rowkey":"key","columns": {"individual_id":{"cf":"rowkey", "col":"key", "type":"string"}, "model_id":{"cf":"cf1", "col":"model_id", "type":"string"}, "individual_id":{"cf":"cf1", "col":"individual_id", "type":"string"}, "individual_id_proxy":{"cf":"cf1", "col":"individual_id_proxy", "type":"string"}}})

df.write.option("catalog",cat).format("org.apache.spark.sql.execution.datasources.hbase").save()

The error is:

An error occurred while calling o202.save. : java.lang.UnsupportedOperationException: empty.tail at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445) at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45) at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123) at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:163) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)

Please let me know if anyone has come across this.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Solved it. It was missing values for the RowKey as pointed out by the error:

org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at

I created the Row object which included all dataframe columns and then it worked.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Solved it. It was missing values for the RowKey as pointed out by the error:

org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at

I created the Row object which included all dataframe columns and then it worked.