Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Exception while using Spark HBase Connector on HDP2.6

avatar
Super Collaborator

Hi Guys,

I am using Spark 1.6.3 and HBase is 1.1.2 on hdp2.6. I have to use Spark 1.6, cannot go to Spark 2. The connector jar is shc-1.0.0-1.6-s_2.10.jar. I am writing to hbase table from the pyspark dataframe:

cat = json.dumps({"table":{"namespace":"dsc", "name":"table1", "tableCoder":"PrimitiveType"},"rowkey":"key","columns": {"individual_id":{"cf":"rowkey", "col":"key", "type":"string"}, "model_id":{"cf":"cf1", "col":"model_id", "type":"string"}, "individual_id":{"cf":"cf1", "col":"individual_id", "type":"string"}, "individual_id_proxy":{"cf":"cf1", "col":"individual_id_proxy", "type":"string"}}})

df.write.option("catalog",cat).format("org.apache.spark.sql.execution.datasources.hbase").save()

The error is:

An error occurred while calling o202.save. : java.lang.UnsupportedOperationException: empty.tail at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445) at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45) at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123) at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:163) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)

Please let me know if anyone has come across this.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Solved it. It was missing values for the RowKey as pointed out by the error:

org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at

I created the Row object which included all dataframe columns and then it worked.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Solved it. It was missing values for the RowKey as pointed out by the error:

org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at

I created the Row object which included all dataframe columns and then it worked.