Created 01-08-2018 11:01 PM
Hi Guys,
I am using Spark 1.6.3 and HBase is 1.1.2 on hdp2.6. I have to use Spark 1.6, cannot go to Spark 2. The connector jar is shc-1.0.0-1.6-s_2.10.jar. I am writing to hbase table from the pyspark dataframe:
cat = json.dumps({"table":{"namespace":"dsc", "name":"table1", "tableCoder":"PrimitiveType"},"rowkey":"key","columns": {"individual_id":{"cf":"rowkey", "col":"key", "type":"string"}, "model_id":{"cf":"cf1", "col":"model_id", "type":"string"}, "individual_id":{"cf":"cf1", "col":"individual_id", "type":"string"}, "individual_id_proxy":{"cf":"cf1", "col":"individual_id_proxy", "type":"string"}}}) df.write.option("catalog",cat).format("org.apache.spark.sql.execution.datasources.hbase").save()
The error is:
An error occurred while calling o202.save.
: java.lang.UnsupportedOperationException: empty.tail
at scala.collection.TraversableLike$class.tail(TraversableLike.scala:445)
at scala.collection.mutable.ArraySeq.scala$collection$IndexedSeqOptimized$super$tail(ArraySeq.scala:45)
at scala.collection.IndexedSeqOptimized$class.tail(IndexedSeqOptimized.scala:123)
at scala.collection.mutable.ArraySeq.tail(ArraySeq.scala:45)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:163)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Please let me know if anyone has come across this.
Created 01-09-2018 09:05 PM
Solved it. It was missing values for the RowKey as pointed out by the error:
org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at
I created the Row object which included all dataframe columns and then it worked.
Created 01-09-2018 09:05 PM
Solved it. It was missing values for the RowKey as pointed out by the error:
org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.initRowKey(HBaseTableCatalog.scala:141) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog.<init>(HBaseTableCatalog.scala:152) at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:209) at
I created the Row object which included all dataframe columns and then it worked.