Reply
Explorer
Posts: 10
Registered: ‎09-08-2016

DataFrame.collect or show throwing java.lang.IllegalArgumentException while reading data from hbase

Here is my java code that reads data from a hbase table:

 

SQLContext sqlContext = new SQLContext(jsc);

HashMap<String, String> colMap = new HashMap<String, String>();
            colMap.put("hbase.columns.mapping", "KEY_FIELD STRING :key" + ", rowCount DOUBLE p:rowCount, diskSize DOUBLE p:diskSize, refreshTime STRING r:lastRefreshTime" );
            colMap.put("hbase.table", hbaseTableName);

DataFrame df = sqlContext.read().format("org.apache.hadoop.hbase.spark").options(colMap).load();

df.registerTempTable("hbasedata");
DataFrame result = sqlContext.sql("SELECT rowCount from hbasedata");
long count = result.count();
System.out.println("Cound of results:"+count);
result.show();

 

Count is returning correctly, but when I try to see data in result data frame, it throws below exception:

 

java.lang.IllegalArgumentException: offset (79) + length (8) exceed the capacity of the array: 80
    at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:629)
    at org.apache.hadoop.hbase.util.Bytes.toLong(Bytes.java:603)
    at org.apache.hadoop.hbase.util.Bytes.toDouble(Bytes.java:727)
    at org.apache.hadoop.hbase.types.RawDouble.decode(RawDouble.java:63)
    at org.apache.hadoop.hbase.spark.DefaultSourceStaticUtils$.getValue(DefaultSource.scala:920)
    at org.apache.hadoop.hbase.spark.HBaseRelation$$anonfun$14$$anonfun$apply$2.apply(DefaultSource.scala:279)
    at org.apache.hadoop.hbase.spark.HBaseRelation$$anonfun$14$$anonfun$apply$2.apply(DefaultSource.scala:278)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
    at org.apache.hadoop.hbase.spark.HBaseRelation$$anonfun$14.apply(DefaultSource.scala:278)
    at org.apache.hadoop.hbase.spark.HBaseRelation$$anonfun$14.apply(DefaultSource.scala:277)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1888)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1888)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
17/10/06 11:09:10 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.IllegalArgumentException: offset (79) + length (8) exceed the capacity of the array: 80
    at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:629)

 

Appreciate any help.

 

Thanks,

Ravi Papisetti

Announcements