Member since
09-19-2016
4
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3308 | 12-08-2016 08:13 PM |
02-19-2020
10:49 PM
with newer versions of spark, the sqlContext is not load by default, you have to specify it explicitly : scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6179af64 scala> import sqlContext.implicits._ import sqlContext.implicits._ scala> sqlContext.sql("describe mytable") res2: org.apache.spark.sql.DataFrame = [col_name: string, data_type: string ... 1 more field] I'm working with spark 2.3.2
... View more
12-08-2016
08:13 PM
So ... after a long hiatus. Turns out this is actually https://issues.apache.org/jira/browse/HBASE-13262 I was using hbase-client 0.96 with HBase 1.0.0 (CDH 5.5) and we had tables that were housing large XML payloads, which would force the bug to manifest when hbase.client.scanner.caching was a high value. There are multiple ways to fix this: Use hbase-client 0.98+, if you can afford to upgrade without impact Lower the value of hbase.client.scanner.caching in CM (this was what I ended up doing) Programatically, use Scan.setCaching(int) and/or Scan.setMaxResultSize() to avoid the region skipping.
... View more